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Preface 


The name “real analysis” is something of an anachronism. Originally applied to the 
theory of functions of a real variable, it has come to encompass several subjects of 
a more general and abstract nature that underlie much of modern analysis. These 
general theories and their applications are the subject of this book, which is intended 
primarily as a text for a graduate-level analysis course. Chapters 1 through 7 are 
devoted to the core material from measure and integration theory, point set topology, 
and functional analysis that is a part of most graduate curricula in mathematics, 
together with a few related but less standard items with which I think all analysts 
should be acquainted. The last four chapters contain a variety of topics that are meant 
to introduce some of the other branches of analysis and to illustrate the uses of the 
preceding material. I believe these topics are all interesting and important, but their 
selection in preference to others is largely a matter of personal predilection. 
The things one needs to know in order to read this book are as follows: 


1. First and foremost, the classical theory of functions of areal variable: limits and 
continuity, differentiation and (Riemann) integration, infinite series, uniform 
convergence, and the notion of a metric space. 


2. The arithmetic of complex numbers and the basic properties of the complex 
exponential function e*t’Y = e*(cosy + isiny). (More advanced results 
from complex function theory are used only in the proof of the Riesz-Thorin 
theorem and in a few exercises and remarks.) 


3. Some elementary set theory. 
vil 
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4. A bit of linear algebra — actually, not much beyond the definitions of vector 
spaces, linear mappings, and determinants. 


All of the necessary material in (1) and (2) can be found in W. Rudin’s classic Princi- 
ples of Mathematical Analysis (3rd ed., McGraw-Hill, 1976) or its descendants such 
as R. S. Strichatrz’s The Way of Analysis (Jones and Bartlett, 1995) or S. G. Krantz’s 
Real Analysis and Foundations (CRC Press, 1991). A summary of the relevant facts 
about sets and metric spaces is provided here in Chapter 0. The reader should be- 
gin this book by examining 80.1 and 80.5 to become familiar with my notation and 
terminology; the rest of Chapter 0 can then be referred to as needed. 

Each chapter concludes with a section entitled “Notes and References.” These 
sections contain miscellaneous remarks, acknowledgments of sources, indications 
of results not discussed in the text, references for further reading, and historical 
notes. The latter are quite sketchy, although references to more detailed sources are 
provided; they are intended mainly to give an idea of how the subject grew out of its 
classical origins. I found it entertaining and instructive to read some of the original 
papers, and I hope to encourage others to do the same. 

A sizable portion of this book is devoted to exercises. They are mostly in the 
form of assertions to be proved, and they range from trivial to difficult; hints and 
intermediate steps are provided for the more complicated ones. Every reader should 
peruse them, although only the most ambitious will try to work them all out. They 
serve several purposes: amplification of results and completion of proofs in the 
text, discussion of examples and counterexamples, applications of theorems, and 
development of further ideas. Instructors will probably wish to do some of the 
exercises in class; to maximize flexibility and minimize verbosity, I have followed 
the principle of “When in doubt, leave it as an exercise,’ especially with regard 
to examples. Exercises occur at the end of each section, but they are numbered 
consecutively within each chapter. In referring to them, “Exercise n” means the nth 
exercise in the present chapter unless another section is explicitly mentioned. 

The topics in the book are arranged so as to allow some flexibility of presentation. 
For example, Chapters 4 and 5 do not depend on Chapters 1—3 except for a few 
examples and exercises. On the other hand, if one wishes to proceed quickly to L? 
theory, one can skip from §3.3 to §§5.1-2 and thence to Chapter 6. Chapters 10 
and 11 are independent of Chapters 8 and 9 except that the ideas in §8.6 are used in 
Chapter 10. 

The new features of this edition are as follows: 


e The material on the n-dimensional Lebesgue integral (§§2.6—7) has been rear- 
ranged and expanded. 


e Tychonoff’s theorem (84.6) is proved by an elegant argument recently discov- 
ered by Paul Chernoff. 


e The chapter on Fourier analysis has been split into two chapters (8 and 9). 
The material on Fourier series and integrals (§§8.3-5) has been rearranged and 
now contains the Dirichlet-Jordan theorem on convergence of Fourier series. 
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The material on distributions (§89.1—2) has been extensively rewritten and 
expanded. 


e A section on self-similarity and Hausdorff dimension (§11.3) has been added, 
replacing the outdated calculation of the Hausdorff dimension of Cantor sets 
in the old 810.2. 


e Innumerable small changes have been made in the hope of improving the 
exposition. 


The writer of a text on such a well-developed subject as real analysis must neces- 
sarily be indebted to his predecessors. I kept a large supply of books on hand while 
writing this one; they are too numerous to list here, but most of them can be found 
in the bibliography. I am also happy to acknowledge the influence of two of my 
teachers: the late Lynn Loomis, from whose lectures I first learned this subject, and 
Elias Stein, who has done much to shape my point of view. Finally, I am grateful to 
a number of people — especially Steven Krantz, Kenneth Ross, and William Faris 
— whose comments and corrigenda concerning the first edition have helped me to 
prepare the new one. 


GERALD B. FOLLAND 


Seattle, Washington 
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Real Analysis 





Prologue 


The purpose of this introductory chapter is to establish the notation and terminology 
that will be used throughout the book and to present a few diverse results from set 
theory and analysis that will be needed later. The style here is deliberately terse, 
since this chapter is intended as a reference rather than a systematic exposition. 


0.1 THE LANGUAGE OF SET THEORY 


It is assumed that the reader is familiar with the basic concepts of set theory; the 
following discussion is meant mainly to fix our terminology. 


Number Systems. Our notation for the fundamental number systems is as 
follows: 
N = the set of positive integers (not including zero) 


Z = the set of integers 

Q = the set of rational numbers 
IR = the set of real numbers 

C = the set of complex numbers 


Logic. We shall avoid the use of special symbols from mathematical logic, 
preferring to remain reasonably close to standard English. We shall, however, use 
the abbreviation iff for “if and only if.” 

One point of elementary logic that is often insufficiently appreciated by students 
is the following: If A and B are mathematical assertions and —A, —B are their 

1 
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negations, the statement “A implies B” is logically equivalent to the contrapositive 
statement “— B implies — A.” Thus one may prove that A implies B by assuming — B 
and deducing —A, and we shall frequently do so. This is not the same as reductio ad 
absurdum, which consists of assuming both A and —B and deriving a contradiction. 


Sets. The words “family” and “collection” will be used synonymously with 
“set,” usually to avoid phrases like “set of sets’’ The empty set is denoted by @, and 
the family of all subsets of a set X is denoted by P(X): 


P(X) = {E: Ec X}. 


Here and elsewhere, the inclusion sign C is interpreted in the weak sense; that is, the 
assertion “E C X” includes the possibility that E = X. 
If € is a family of sets, we can form the union and intersection of its members: 


U E = {x : z € E for some E € £}, 
Be€é 


A) E={x:2€ Eforall Le €}. 
Ece 


Usually it is more convenient to consider indexed families of sets: 


Telba EA Eak 


in which case the union and intersection are denoted by 
LJ Fa, [) Eao- 
QacA acA 


If EaN Eg = Ø whenevera # p, the sets Ea are called disjoint. The terms “disjoint 
collection of sets” and “collection of disjoint sets” are used interchangeably, as are 
“disjoint union of sets” and “union of disjoint sets.” 

When considering families of sets indexed by N, our usual notation will be 


{En} or {En}, 


and likewise for unions and intersections. In this situation, the notions of limit 
superior and limit inferior are sometimes useful: 


limsup En = (} (J En,  liminf En = J () En. 
k=1ln=k k=1n=k 
The reader may verify that 
limsup En = {x: x € En for infinitely many n}, 
lim inf En = {x : x € En forall but finitely many n }. 
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If E and F are sets, we denote their difference by E \ F: 
E\F={«#:2€ Eandc ¢ F}, 
and their symmetric difference by FEAF: 
BAF =(E\F)U(F\E). 


When it is clearly understood that all sets in question are subsets of a fixed set X, we 
define the complement E° of a set E (in X): 


E°=X\E. 


In this situation we have deMorgan’s laws: 


uage (Qay 


aEA 


If X and Y are sets, their Cartesian product X x Y is the set of all ordered pairs 
(x,y) such that z € X and y € Y. A relation from X to Y is a subset of X x Y. 
(If Y = X, we speak of a relation on X.) If R is a relation from X to Y, we shall 
sometimes write x Ry to mean that (x, y) € R. The most important types of relations 
are the following: 


e Equivalence relations. An equivalence relation on X is a relation R on X 
such that 
xRrz forall z € X, 


xRy iff yRz, 


xRz whenever xRy and yRz for some y. 


The equivalence class of an element z is {y € X : xRy}. X is the disjoint 
union of these equivalence classes. 


e Orderings. See §0.2. 


e Mappings. A mapping f : X — Y is arelation R from X to Y with the 
property that for every x € X there is a unique y € Y such that x Ry, in which 
case we write y = f(x). Mappings are sometimes called maps or functions; 
we Shall generally reserve the latter name for the case when Y is C or some 
subset thereof. 


If f: X — Y andg: Y — Z are mappings, we denote by go f their composition: 
gof:X >Z, go f(x) =g(f(x)). 


If D C X and E C Y, we define the image of D and the inverse image of E 
under a mapping f : X — Y by 


f(D)= 4 f(t) :x EDY}, fF (E)=422 f(a) €E}. 
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It is easily verified that the map f—! : P(Y) — P(X) defined by the second formula 
commutes with union, intersections, and complements: 


i U Ba)= UF), FC) Ba) = 1) fd, 


aCA acA acA QacA 
J (ES) = (FTE). 


(The direct image mapping f : P(X) — P(Y ) commutes with unions, but in general 
not with intersections or complements.) 

If f: X — Y is a mapping, X is called the domain of f and f(X) is called the 
range of f. f is said to be injective if f(z) = f(x2) only when zı = 29, surjective 
if f(X) = Y, and bijective if it is both injective and surjective. If f is bijective, it 
has an inverse f`! : Y — X such that f—1o f and fo f7Żt are the identity mappings 
on X and Y, respectively. If A C X, we denote by f|A the restriction of f to A: 


(f|A): A >Y, (fIA)(x) = f(x) forz € A. 


A sequence in a set X is a mapping from N into X. (We also use the term finite 
sequence to mean a map from {1,...,n} into X where n € N) If f:N—X isa 
sequence and g : N — N satisfies g(n) < g(m) whenever n < m, the composition 
f og is called a subsequence of f. It is common, and often convenient, to be careless 
about distinguishing between sequences and their ranges, which are subsets of X 
indexed by N. Thus, if f(n) = £n, we speak of the sequence {xn }9°; whether we 
mean a mapping from N to X or a subset of X will be clear from the context. 

Earlier we defined the Cartesian product of two sets. Similarly one can define the 
Cartesian product of n sets in terms of ordered n-tuples. However, this definition 
becomes awkward for infinite families of sets, so the following approach is used 
instead. If {Xa}aca is an indexed family of sets, their Cartesian product Lea oe 
is the set of all maps f : A > Uaca Xa such that f(a) € Xa for every a € A. (It 
should be noted, and then promptly forgotten, that when A = {1,2}, the previous 
definition of X, x Xz is set-theoretically different from the present definition of 
ie Xj. Indeed, the latter concept depends on mappings, which are defined in terms 
of the former one.) If X = [],-4 Xa and a € A, we define the ath projection or 
coordinate map Ta : X — Xa by 7a(f) = f(a). We also frequently write x and 
£a instead of f and f(a) and call za the ath coordinate of zx. 

If the sets Xa are all equal to some fixed set Y, we denote ||, cA Xa by ya 


Y^ = the set of all mappings from A to Y. 


IfA = {1,...,n}, Y ĉis denoted by Y” and may be identified with the set of ordered 
n-tuples of elements of Y. 


0.2 ORDERINGS 


A partial ordering on a nonempty set X is a relation R on X with the following 
properties: 
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o if xRy and yRz, then z Rz; 
e if xRy and yRz, then x = y; 
e xz forall z. 
If R also satisfies 
e if x,y E€ X, then either xRy or yRz, 


then R is called a linear (or total) ordering. For example, if E is any set, then P(E) 
is partially ordered by inclusion, and R is linearly ordered by its usual ordering. 
Taking this last example as a model, we shall usually denote partial orderings by 
<, and we write x < y to mean that x < y but x Æ y. We observe that a partial 
ordering on X naturally induces a partial ordering on every nonempty subset of X. 
Two partially ordered sets X and Y are said to be order isomorphic if there is a 
bijection f : X — Y such that zı < ro iff f(x1) < f(z). 

If X is partially ordered by <, a maximal (resp. minimal) element of X is an 
element x € X such that the only y € X satisfying x < y (resp. x > y) is zx itself. 
Maximal and minimal elements may or may not exist, and they need not be unique 
unless the ordering is linear. If & C X, an upper (resp. lower) bound for ÈE is an 
element x E€ X such that y < x (resp. x < y) forall y € & An upper bound for & 
need not be an element of E, and unless F& is linearly ordered, a maximal element of 
E need not be an upper bound for E. (The reader should think up some examples.) 

If X is linearly ordered by < and every nonempty subset of X has a (necessarily 
unique) minimal element, X is said to be well ordered by <, and (in defiance of the 
laws of grammar) < is called a well ordering on X. For example, N is well ordered 
by its natural ordering. 

We now state a fundamental principle of set theory and derive some consequences 
of it. 


0.1 The Hausdorff Maximal Principle. Every partially ordered set has a maximal 
linearly ordered subset. 


In more detail, this means that if X is partially ordered by <, there isa set E&E C X 
that is linearly ordered by <, such that no subset of X that properly includes F is 
linearly ordered by <. Another version of this principle is the following: 


0.2 Zorn’s Lemma. [f X is a partially ordered set and every linearly ordered subset 
of X has an upper bound, then X has a maximal element. 


Clearly the Hausdorff maximal principle implies Zorn’s lemma: An upper bound 
for a maximal linearly ordered subset of X is a maximal element of X. It is also not 
difficult to see that Zorn’s lemma implies the Hausdorff maximal principle. (Apply 
Zorn’s lemma to the collection of linearly ordered subsets of X, which is partially 
ordered by inclusion.) 


0.3 The Well Ordering Principle. Every nonempty set X can be well ordered. 
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Proof. Let W be the collection of well orderings of subsets of X, and define a 
partial ordering on W as follows. If <; and <% are well orderings on the subsets 
E and Bo, then <; precedes <% in the partial ordering if (i) <2 extends <j, i.e., 
Ey C Eg and <, and <% agree on Fy, and (ii) if x € Eg \ Ey; then y <2 z for all 
y € Ei. The reader may verify that the hypotheses of Zorn’s lemma are satisfied, so 
that W has a maximal element. This must be a well ordering on X itself, for if < is 
a well ordering on a proper subset E of X and zo € X \ E, then < can be extended 
to a well ordering on E U {xo} by declaring that x < zo forall z € E. E 


0.4 The Axiom of Choice. If {Xq}aca is a nonempty collection of nonempty sets, 
then | Jaca Xa is nonempty. 


Proof. Let X = (Jaca Xa. Pick a well ordering on X and, for a € A, let f(a) 
be the minimal element of Xa. Then f € ize rece E 


0.5 Corollary. If {Xa}aca is a disjoint collection of nonempty sets, there is a set 
Y C Uses Xa such that Y N Xa contains precisely one element for each a € A. 


Proof. Take Y = f(A) where f € [Jaca Xa. a 


We have deduced the axiom of choice from the Hausdorff maximal principle; in 
fact, it can be shown that the two are logically equivalent. 


0.3 CARDINALITY 


If X and Y are nonempty sets, we define the expressions 
card(X) < card(Y), card( X) = card(Y), card(X ) > card(Y) 


to mean that there exists f : X — Y which is injective, bijective, or surjective, 
respectively. We also define 


card(X ) < card(Y), card(X ) > card(Y ) 


to mean that there is an injection but no bijection, or a surjection but no bijection, 
from X to Y. Observe that we attach no meaning to the expression “card(X )” when 
it stands alone; there are various ways of doing so, but they are irrelevant for our 
purposes (except when X is finite — see below). These relationships can be extended 
to the empty set by declaring that 


card(@) < card(X) and card(X) > card(@) for all X 4 Ø. 


For the remainder of this section we assume implicitly that all sets in question are 
nonempty in order to avoid special arguments for @. Our first task is to prove that 
the relationships defined above enjoy the properties that the notation suggests. 
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0.6 Proposition. card(X) < card(Y) iffcard(Y) > card( X). 


Proof. If f: X — Y is injective, pick x9 € X and define g : Y — X by 
g(y) = f7! (y) ify € f(X), gly) = xo otherwise. Then g is surjective. Conversely, 
if g : Y — X is surjective, the sets g~1({x}) (x € X) are nonempty and disjoint, so 
any f € [zex 97 ({x}) is an injection from X to Y. E 


0.7 Proposition. For any sets X and Y, either card( X) < card(Y) or card(Y) < 
card(X). 


Proof. Consider the set J of all injections from subsets of X to Y. The members 
of J can be regarded as subsets of X x Y, so J is partially ordered by inclusion. It is 
easily verified that Zorn’s lemma applies, so J has a maximal element f, with (say) 
domain A and range B. If zo E€ X \ Aand yo € Y \ B, then f can be extended 
to an injection from A U {zo} to Y U {yo} by setting f(zo) = yo, contradicting 
maximality. Hence either A = X, in which case card( X) < card(Y), or B = Y, in 
which case fT! is an injection from Y to X and card(Y) < card(X). E 


0.8 The Schröder-Bernstein Theorem. /f card( X) < card(Y) and card(Y) < 
card(X ) then card(X ) = card (Y). 


Proof. Let f: X — Y andg:Y — X be injections. Consider a point z € X: 
If x € g(Y), we form g7! (x) € Y; if g7t(x) € f(X), we form f—!(g—!(zx)); and 
so forth. Either this process can be continued indefinitely, or it terminates with an 
element of X \ g(Y ) (perhaps z itself), or it terminates with an element of Y \ f(X). 
In these three cases we say that x is in Xə, Xx, or Xy; thus X is the disjoint union 
of Xə, Xx, and Xy. In the same way, Y is the disjoint union of three sets Ys, Yx, 
and Yy. Clearly f maps Xə onto Y» and Xx onto Yx, whereas g maps Yy onto 
Xy. Therefore, if we define h : X — Y by h(x) = f(x) if X € X» U Xx and 
h(x) = gt (x) if x € Xy, then h is bijective. E 


0.9 Proposition. For any set X, card(X) < card(P(X)). 


Proof. On the one hand, the map f(x) = {x} is an injection from X to P(X). 
On the other, if g : X — P(X), let Y = {x € X : x ¢ g(x)}. Then Y ¢ g(X), for 
if Y = g(xo) for some xp € X, any attempt to answer the question “Is zo € Y?” 
quickly leads to an absurdity. Hence g cannot be surjective. E 


A set X is called countable (or denumerable) if card(X) < card(N). In 
particular, all finite sets are countable, and for these it is convenient to interpret 
“card(X y” as the number of elements in X: 


card(X) = n iff card(X) = card({1,...,n}). 


If X is countable but not finite, we say that X is countably infinite. 
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0.10 Proposition. 
a. If X and Y are countable, so is X x Y. 
b. If A is countable and Xa is countable for every a € A, then yc, Xa is 
countable. 
c. If X is countably infinite, then card(X ) = card(N). 


Proof. To prove (a) it suffices to prove that N? is countable. But we can define 
a bijection from N to N? by listing, for n successively equal to 2,3,4,..., those 
elements (j, k) € N? such that j + k = n in order of increasing j, thus: 


(151) 5 (hy 25. (261) sy 3302, 2) (81) (14), (2,38) 3, 2), (ADs 


As for (b), for each a € A there is a surjective f, : N — Xa, and then the map 
f: Nx A > Uaca Xa defined by f(n, a) = fa(n) is surjective; the result therefore 
follows from (a). Finally, for (c) it suffices to assume that X is an infinite subset 
of N. Let f(1) be the smallest element of X, and define f(n) inductively to be the 
smallest element of E \ {f(1),..., f(n — 1)}. Then f is easily seen to be a bijection 
from N to X. N 


0.11 Corollary. Z and Q are countable. 


Proof. Z is the union of the countable sets N, {—n : n € N}, and {0}, and one 
can define a surjection f : Z? — Q by f(m, n) = m/n ifn £ Oand f(m, 0) = 0. g 

A set X is said to have the cardinality of the continuum if card(X ) = card (R). 
We shall use the letter ¢ as an abbreviation for card(R): 


card(X) = c iff card(X) = card(R). 
0.12 Proposition. card(P(N)) = c. 


Proof. If A CN, define f(A) € R to be $ nea 27” if N \ A is infinite and 
1+ > nea 2 ” if N\A is finite. (In the twocases, f(A) is the number whose base-2 
decimal expansion is 0.aja9--- or 1.ajaq---, where a, = lifn € Aanda, = 0 
otherwise.) Then f : P(N) — Ris injective. On the other hand, define g : P(Z) — R 
by g(A) = log(ġ „ea 2”) if A is bounded below and g(A) = 0 otherwise. Then 
g is surjective since every positive real number has a base-2 decimal expansion. 
Since card(P(Z)) = card(P(N)), the result follows from the Schréder-Bernstein 
theorem. E 


0.13 Corollary. [f card(X ) > c, then X is uncountable. 


Proof. Apply Proposition 0.9. E 


The converse of this corollary is the so-called continuum hypothesis, whose va- 
lidity is one of the famous undecidable problems of set theory; see §0.7. 
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0.14 Proposition. 
a. Ifcard(X) < cand card(Y) < c, then card( X x Y) < c. 
b. Ifcard(A) < c and card(Xa) < c forall a € A, then card(\U a c4 Xa) < ©. 


Proof. For (a) it suffices to take X = Y = P(N). Define ¢,y : N — N by 
b(n) = 2n and y(n) = 2n — 1. It is then easy to check that the map f : P(N)? — 
P(N) defined by f(A, B) = (A) U y(B) is bijective. (b) follows from (a) as in the 
proof of Proposition 0.10. E 


0.4 MORE ABOUT WELL ORDERED SETS 


The material in this section is optional; it is used only in a few exercises and in some 
notes at the ends of chapters. 

Let X be a well ordered set. If A C X is nonempty, A has a minimal element, 
which is its maximal lower bound or infimum; we shall denote it by inf A. If A is 
bounded above, it also has a minimal upper bound or supremum, denoted by sup A. 
If x € X, we define the initial segment of x to be 


i= Vex ye eh. 


The elements of I, are called predecessors of z. 
The principle of mathematical induction is equivalent to the fact that N is well 
ordered. It can be extended to arbitrary well ordered sets as follows: 


0.15 The Principle of Transfinite Induction. Let X be a well ordered set. If A is 
a subset of X such that x € A whenever 1, C A, then A= X. 


Proof. If X # A, let x = inf(X \ A). Then J, C A but z ¢ A. E 


0.16 Proposition. If X is well orderedand A C X, then Ce A {z is either an initial 
segment or X itself. 


Proof. Let J = |] e4 Ix. If J # X, letb = inf(X \ J). If there existed y € J 
with y > b, we would have y € I, for some x € A and hence b € I}, contrary to 
construction. Hence J C J}, and it is obvious that I C J. E 


0.17 Proposition. Jf X and Y are well ordered, then either X is order isomorphic 
to Y, or X is order isomorphic to an initial segment in Y, or Y is order isomorphic 
to an initial segment in X. 


Proof. Consider the set F of order isomorphisms whose domains are initial 
segments in X or X itself and whose ranges are initial segments in Y or Y itself. 
F is nonempty since the unique f : {inf X} — {inf Y} belongs to F, and F is 
partially ordered by inclusion (its members being regarded as subsets of X x Y). 
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An application of Zorn’s lemma shows that F has a maximal element f, with (say) 
domain A and range B. If A = I, and B = I}, then AU {x} and B U {y} are 
again initial segments of X and Y, and f could be extended by setting f(x) = y, 
contradicting maximality. Hence either A = X or B = Y (or both), and the result 
follows. E 


0.18 Proposition. There is an uncountable well ordered set Q such that I, is count- 
able for each x € Q. IfQ' is another set with the same properties, then Q and Q are 
order isomorphic. 


Proof. Uncountable well ordered sets exist by the well ordering principle; let X 
be one. Either X has the desired property or there is a minimal element x9 such that 
Iz, is uncountable, in which case we can take Q = Izo. If Q’ is another such set, Q’ 
cannot be order isomorphic to an initial segment of Q or vice versa, because Q and 
Q’ are uncountable while their initial segments are countable, so Q and Q’ are order 
isomorphic by Proposition 0.17. E 


The set Q in Proposition 0.18, which is essentially unique qua well ordered set, is 
called the set of countable ordinals. It has the following remarkable property: 


0.19 Proposition. Every countable subset of Q has an upper bound. 


Proof. If A C Q is countable, Ce 4 Ix is countable and hence is not all of Q. 
By Proposition 0.16, there exists y € Q such that |] <4 Jz = Iy, and y is thus an 
upper bound for A. E 


The set N of positive integers may be identified with a subset of Q as follows. Set 
f(1) = inf Q, and proceeding inductively, set f(n) = inf(Q\{f(1),...,f(m—1)}). 
The reader may verify that f is an order isomorphism from N to [,,, where w is the 
minimal element of Q such that J, is infinite. 

It is sometimes convenient to add an extra element w; to Q to form a set Q* = 
Q U {w } and to extend the ordering on 2 to Q* by declaring that z < w; for all 
x € Q. w is called the first uncountable ordinal. (The usual notation for w1 is Q, 
since w1 is generally taken to be the set of countable ordinals itself.) 


0.5 THE EXTENDED REAL NUMBER SYSTEM 


It is frequently useful to adjoin two extra points co (= +00) and —oo to R to form the 
extended real number system R = RU { —o0, oo}, and to extend the usual ordering 
on R by declaring that —co < x < oo forall x € R. The completeness of R can then 
be stated as follows: Every subset A of R has a least upper bound, or supremum, 
and a greatest lower bound, or infimum, which are denoted by sup A and inf A. If 
A = {a1,...an,}, we also write 


max(aj,...,@,) = sup A, min(a,,...,@n) = inf A. 
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From completeness it follows that every sequence {z,, } in R has a limit superior 
and a limit inferior: 


lim sup zn = inf (sup Es), lim inf z, = sup ( inf Sn 
k21 \n>k k>1 \n2k 


The sequence {2,,} converges (in R) iff these two numbers are equal (and finite), in 
which case its limit is their common value. One can also define lim sup and lim inf 
for functions f : R — R, for instance: 


lim sup f(z) = inf ( sup f(z)). 


za 6>0 \0<|r—a| <6 
The arithmetical operations on R can be partially extended to R: 


Dee Oo = shoo (ae R); 00 + CO = œ, —0o — œ% = —O0, 


ttoo) = to (a S-0), x - (too = Foo (x < 0). 


We make no attempt to define co — ov, but we abide by the convention that, unless 
otherwise stated, 
0- (+œ) = 0. 


(The expression 0 - oo turns up now and then in measure theory, and for various 
reasons its proper interpretation is almost always 0.) 
We employ the following notation for intervals in R: if -coo < a < b < œo, 


(Gb) = ear <5), ab {oe <2 <b). 
(a,b}={r:a<ar< bd}, a) a(etan a= bh. 


We shall occasionally encounter uncountable sums of nonnegative numbers. If X 
is an arbitrary set and f : X — [0, co], we define ` „ex f(x) to be the supremum 
of its finite partial sums: 


> f@) = sp D f(x): FCX, F nite} 


rEX xeEF 


(Later we shall recognize this as the integral of f with respect to counting measure 
on X.) 


0.20 Proposition. Given f : X — [0,co], let A = {x : f(x) > 0}. FA is 
uncountable, then >) „<y f(x) = œ. If A is countably infinite, then $` <x f(x) = 
XI f(g(n)) where g : N — A is any bijection and the sum on the right is an 
ordinary infinite series. 


Proof. We have A = Uy An where A, = {x : f(x) > 1/n}. If A is 
uncountable, then some A,, must be uncountable, and >) „ep f(z) > card(F)/n for 
F a finite subset of A,,; it follows that ae f(x) = oo. If A is countably infinite, 
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g:N- Aisa bijection, and By = g({1,..., N}), then every finite subset F' of A 
is contained in some By. Hence 


N 
XO f(z) < X F(g(n)) < SS F(z). 
1 


reF 


Taking the supremum over N, we find 


> JOES flg(n)) < TO), 
1 


reEF rEx 
and then taking the supremum over F’, we obtain the desired result. E 


Some terminology concerning (extended) real-valued functions: A relation be- 
tween numbers that is applied to functions is understood to hold pointwise. Thus 
f < g means that f(x) < g(x) for every x, and max(f,g) is the function whose 
value at x is max( f(x), g(x)). If X C Rand f : X —R, f is called increasing 
if f(x) < f(y) whenever x < y and strictly increasing if f(x) < f(y) whenever 
x < y; similarly for decreasing. A function that is either increasing or decreasing is 
called monotone. 

If f : R — R is an increasing function, then f has right- and left-hand limits at 
each point: 

f(a+) = lim f(x) = inf f(z),  f(a—) = lim f(x) = sup f(z). 
rN a x>a rfa rca 
Moreover, the limiting values f(co) = supger f(x) and f(—oo) = infer f(z) 
exist (possibly equal to too). f is called right continuous if f(a) = f(a+) for all 
a € R and left continuous if f(a) = f(a—) forall a E R. 

For points z in R or C, |x| denotes the ordinary absolute value or modulus of z, 

la + ib| = Va? + b2. For points x in R” or C”, |x| denotes the Euclidean norm: 


z 1/2 
el = [Soles] 
1 


We recall that a set U C R is open if, for every x € U, U includes an interval 
centered at x. 





0.21 Proposition. Every open set in Ris a countable disjoint union of open intervals. 


Proof. If U is open, for each x € U consider the collection J, of all open 
intervals J such that x € I C U. It is easy to check that the union of any family 
of open intervals containing a point in common is again an open interval, and hence 
Js = Ures, I is an open interval; it is the largest element of Jz. If x,y € U then 
either J; = Jy or J; N Jy = Ø, for otherwise Jz U Jy would be a larger open interval 
than J, inJ,. Thus if J = {J, : x € U}, the (distinct) members of J are disjoint, 
and U = Uj;e,J. For each J € J, pick a rational number f(J) € J. The map 
f : J3 — Q thus defined is injective, for if J #4 J’ then J N J’ = Ø; therefore J is 
countable. i 
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0.6 METRIC SPACES 


A metric on a set X is a function p : X x X — [0, oo) such that 
e p(x, y) =Oiffe=y 
© p(x, y) = p(y, x£) forall z, y € X; 
© p(z,z) < p(z,y) + ply, z) forall z,y,z € X. 


(Intuitively, p(x, y) is to be interpreted as the distance from z to y.) A set equipped 
with a metric is called a metric space. Some examples: 


i. The Euclidean distance p(x, y) = |x — y| is a metric on R”. 


x 1 
ii. pi(f,9) = Jo |f (£) — g(x)| dx and pæl f, 9) = supo<z<i |f (£) — g(x)| are 
metrics on the space of continuous functions on [0, 1]. 


iii. If pis a metric on X and A C X, then p|(A x A) isa metric on A. 


iv. If (X1, p1) and (Xo, p2) are metric spaces, the product metric p on X; x X2 
is given by 


p((21, £2), (Y1, y2)) = max (p1 (21, yı), p2(T2, y2)). 
Other metrics are sometimes used on X; x Xo, for instance, 
1/2 
pi(z1,Y1) + p2(z2,ye2) or [pr(21, yi)” + p2(22, yo)* | a 
These, however, are equivalent to the product metric in the sense that we shall 
define at the end of this section. 


Let (X, p) be a metric space. If x € X andr > 0, the (open) ball of radius r 
about z is 
B(r,z) = fye X p(z,y) <r}. 


A set E C X is open if for every x € E there exists r > 0 such that B(r,z) C E, 
and closed if its complement is open. For example, every ball B(r, x) is open, for 
if y E€ B(r,x) and p(z,y) = s then B(r — s,y) C B(r,x). Also, X and Ø are 
both open and closed. Clearly the union of any family of open sets is open, and 
hence the intersection of any family of closed sets is closed. Also, the intersection 
(resp. union) of any finite family of open (resp. closed) sets is open (resp. closed). 
Indeed, if U,,...U,, are open and x € are U,, for each j there exists r; > 0 such that 
B(r;,z) C U;, and then B(r, x) C QF U; where r = min(ri,...,7n), sof, U; is 
open. 

If E C X, the union of all open sets U C E is the largest open set contained in E; 
it is called the interior of & and is denoted by E°. Likewise, the intersection of all 
closed sets F D E is the smallest closed set containing &; it is called the closure of 
E and is denoted by E. E is said to be dense in X if E = X, and nowhere dense if 
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E has empty interior. X is called separable if it has a countable dense subset. (For 
example, Q” is a countable dense subset of R”.) A sequence {z,,} in X converges 
tox € X (symbolically: z, > z or lim zn = 2) if lim, p(Ln, £) = 0. 


0.22 Proposition. Jf X is a metric space, E C X, and x E X, the following are 
equivalent: 


a. re E. 
b. B(r,z)NE# @ forallr > 0. 


c. There is a sequence {zn} in E that converges to z. 


Proof. If B(r,z) NE = Ø, then B(r,x)° is a closed set containing E but not 
z, sox ¢ E. Conversely, if x ¢ F, since (E)° is open there exists r > 0 such 
that B(r,x) c (E)° c E°. Thus (a) is equivalent to (b). If (b) holds, for each 
n € N there exists zn € B(n-!,r) N E, so that x, — x. On the other hand, if 
B(r, x) N E = Ø, then p(y, x) > r for all y € E, so no sequence of E can converge 
to x. Thus (b) is equivalent to (c). E 


If (X1, p1) and (X2, p2) are metric spaces, a map f : Xı — X? is called contin- 
uous at x € X if for every € > 0 there exists 6 > 0 such that po(f(y), f(£)) < € 
whenver pı (x,y) < 6 — in other words, such that f—!(B(e, f(x))) D B(6, x£). The 
map f is called continuous if it is continuous at each x € X; and uniformly contin- 
uous if, in addition, the 6 in the definition of continuity can be chosen independent 
of x. 


0.23 Proposition. f : Xı — Xə is continuous iff f—'(U) is open in Xj for every 
openU C Xo. 


Proof. If the latter condition holds, then for every x € X, and e > 0, the set 
f—*(B(e, f(x))) is open and contains z, so it contains some ball about z; this means 
that f is continuous at x. Conversely, suppose that f is continuous and U is open 
in X2. For each y € U there exists e, > 0 such that B(e,,y) C U, and for each 
x € f—!({y}) there exists 6, > 0 such that B(6,,z) C f—'(Ble,,y)) C f-1(U). 
Thus f~!(U) = Uzes-1(u) B(éz, £) is open. E 


A sequence {z,n} in a metric space (X, p) is called Cauchy if p(£n, £m) — 0 
as n,m — œ. A subset E of X is called complete if every Cauchy sequence in 
E converges and its limit is in &. For example, R” (with the Euclidean metric) is 
complete, whereas Q” is not. 


0.24 Proposition. A closed subset of a complete metric space is complete, and a 
complete subset of an arbitrary metric space is closed. 


Proof. If X is complete, Æ C X is closed, and {zn } is a Cauchy sequence in F, 
{zn} has a limit in X. By Proposition 0.22, x € E = E. If E C X is complete and 
x € E, by Proposition (0.22) there is a sequence {zn} in E converging to x. {£n} 
is Cauchy, so its limit lies in Ẹ; thus E = E. E 
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In a metric space (X, p) we can define the distance from a point to a set and the 
distance between two sets. Namely, if x E€ X and £, FC X, 


p(z, E) = inf{p(z,y):y € E}, 
p(E, F) =inf{p(z,y): 2 € E, ye F} = inf{p(x, F): 2 € E}. 


Observe that, by Proposition 0.22, p(x, E) = 0 iff z € E. We also define the 
diameter of & C X to be 


diam E = sup{p(z, y) : 2,y € E}. 


E is called bounded if diam E < oo. 

If E C X and {Va}aca is a family of sets such that E C Uaes Vas {Vahaea 
is called a cover of E, and F& is said to be covered by the V,’s. E is called totally 
bounded if, for every € > 0, E can be covered by finitely many balls of radius e. 
Every totally bounded set is bounded, for if x,y € U; Ble, zj), say x € Ble, z1) 
and y € Be, 22), then 


p(x, y) < p(x, 21) F p(21, 22) + p(22,y) < 2e + max{p(zj, zk) ae j,k < n}. 


(The converse is false in general.) If E is totally bounded, so is Ẹ, for it is easily 
seen that if E C U] Ble, zj), then E C UP B(2e, z;). 


0.25 Theorem. Jf E is a subset of the metric space (X, p), the following are equiv- 
alent: 


a. E is complete and totally bounded. 

b. (The Bolzano-Weierstrass Property) Every sequence in E has a subsequence 
that converges to a point of E. 

c. (The Heine-Borel Property) If {Va }aca is a cover of E by open sets, there 
is a finite set F C A such that {Va}aer covers E. 


Proof. We shall show that (a) and (b) are equivalent, that (a) and (b) together 
imply (c), and finally that (c) implies (b). 

(a) implies (b): Suppose that (a) holds and {z,,} is a sequence in E. E can be 
covered by finitely many balls of radius 27+, and at least one of them must contain £n 
for infinitely many n: say, £n € Bı for n € Nı. E N Bi; can be covered by finitely 
many balls of radius 272, and at least one of them must contain x, for infinitely many 
n € Ny: say, £n € Bo for n € No. Continuing inductively, we obtain a sequence 
of balls B; of radius 2~/ and a decreasing sequence of subsets N; of N such that 
Zr, E€ Bj for n € Nj. Pick ny E€ Ni, no E€ No,... such that ny < ng <- 
Then {x,,,} is a Cauchy sequence, for p(£n;, £n) < 2177 if k > j, and since E is 
complete, it has a limit in Æ. 

(b) implies (a): We show that if either condition in (a) fails, then so does (b). If 
E is not complete, there is a Cauchy sequence {xn} in E with no limit in Æ. No 
subsequence of {zn} can converge in F, for otherwise the whole squence would 
converge to the same limit. On the other hand, if E is not totally bounded, let € > O 
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be such that & cannot be covered by finitely many balls of radius €. Choose z, € E 
inductively as follows. Begin with any xz; € E, and having chosen zj,...,2n, 
pick tn41 E€ E \ Uy Ble, £j). Then p(£n, £m) > € for all n,m, so {£n} has no 
convergent subsequence. 

(a) and (b) imply (c): It suffices to show that if (b) holds and {Va }aea is a cover 
of E by open sets, there exists € > 0 such that every ball of radius € that intersects 
E is contained in some Va, for & can be covered by finitely many such balls by (a). 
Suppose to the contrary that for each n € N there is a ball B, of radius 27” such 
that Bn N E # Ø and B, is contained in no Va. Pick zn € Bn N E; by passing to a 
subsequence we may assume that {z,,} converges to some z € E. We have z € Va 
for some a, and since V, is open, there exists € > 0 such that B(e, x) C Va. But if 
n is large enough so that p(£n, x£) < €/3 and 27” < €/3, then Ba C Ble, x) C Va, 
contradicting the assumption on Bn. 

(c) implies (b): If {x,,} is a sequence in E with no convergent subsequence, for 
each x € F there is a ball B, centered at z that contains z, for only finitely many n 
(otherwise some subsequence would converge to x). Then {Bz}zeg is a cover of E 
by open sets with no finite subcover. H 


A set & that possesses the properties (a)-(c) of Theorem 0.25 is called compact. 
Every compact set is closed (by Proposition 0.24) and bounded; the converse is false 
in general but true in R”. 


0.26 Proposition. Every closed and bounded subset of IR” is compact. 


Proof. Since closed subsets of IR” are complete, it suffices to show that bounded 
subsets of R” are totally bounded. Since every bounded set is contained in some 


cube 

Q = [-R, R]” = {x € R” : max(|x1|,...,|2n|) < R}, 
it is enough to show that Q is totally bounded. Given € > O, pick an integer 
k > R,/n/e, and express Q as the union of k” congruent subcubes by dividing the 
interval [— R, R] into k equal pieces. The side length of these subcubes is 2R/k and 
hence their diameter is ,/n(2R/k) < 2e, so they are contained in the balls of radius 
€ about their centers. E 


Two metrics pı and p2 on a set X are called equivalent if 
C'pı < po < C” pı for some C, C” > 0. 


It is easily verified that equivalent metrics define the same open, closed, and compact 
sets, the same convergent and Cauchy sequences, and the same continuous and uni- 
formly continuous mappings. Consequently, most results concerning metric spaces 
depend not on the particular metric chosen but only on its equivalence class. 


0.7 NOTES AND REFERENCES 


§80.1-0.4: The best exposition of set theory for beginners is Halmos [63], and 
Smullyan and Fitting [135] is a good text on a more advanced level. Kelley [83] 
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also contains a concise account of of basic axiomatic set theory. All of these books 
present a deduction of the Hausdorff maximal principle from the axiom of choice, as 
does Hewitt and Stromberg [76]. 

The axiom of choice (or one of the propositions equivalent to it) is generally taken 
as one of the basic postulates in the axiomatic formulations of set theory. Some 
mathematicians of the intuitionist or constructivist persuasion reject it on the grounds 
that one has not proved the existence of a mathematical object until one has shown 
how to construct it in some reasonably explicit fashion, whereas the whole point of 
the axiom of choice is to provide existence theorems when constructive methods fail 
(or are too cumbersome for comfort). People who are seriously bothered by such 
objections belong to a minority that does not include the present writer; in this book 
the axiom of choice is used sparingly but freely. 

The continuum hypothesis is the assertion that if card(X) < c, then X is 
countable. (Since it follows easily from the construction of Q, the set of countable 
ordinals, that card(Q) < card(X) for any uncountable X, an equivalent assertion 
is that card(Q) = c.) It is known, thanks to Gödel and Cohen, that the continuum 
hypothesis and its negation are both consistent with the standard axioms of set theory 
including the axiom of choice, assuming that those axioms are themselves consistent. 
(An exposition of the consistency and independence theorems for the axiom of choice 
and the continuum hypothesis can be found in Smullyan and Fitting [135].) Some 
mathematicians are willing to accept the continuum hypothesis as true, seemingly as 
a matter of convenience, but Gödel [56] and Cohen [26, p. 151] have both expressed 
suspicions that it should be false, and as of this writing no one has found any really 
compelling evidence on one side or the other. My own feeling, subject to revision 
in the event of a major breakthrough in set theory, is that if the answer to one’s 
question turns out to depend on the continuum hypothesis, one should give up and 
ask a different question. 


80.6: A more detailed discussion of metric spaces can be found in Loomis and 
Sternberg [95] and DePree and Swartz [32]. 





Measures 


In this chapter we set forth the basic concepts of measure theory, develop a general 
procedure for constructing nontrivial examples of measures, and apply this procedure 
to construct measures on the real line. 


1.1 INTRODUCTION 


One of the most venerable problems in geometry is to determine the area or volume 
of a region in the plane or in 3-space. The techniques of integral calculus provide a 
satisfactory solution to this problem for regions that are bounded by “nice” curves or 
surfaces but are inadequate to handle more complicated sets, even in dimension one. 
Ideally, for n € N we would like to have a function pz that assigns to each & C R” 
a number (E) € [0, co], the n-dimensional measure of FE, such that (EF) is given 
by the usual integral formulas when the latter apply. Such a function u should surely 
possess the following properties: 


i. If Æi, &o,... is a finite or infinite sequence of disjoint sets, then 


p(B, U E2 U- ++) = (E1) + u(E2) +--+. 


ii. If & is congruent to F (that is, if & can be transformed into F by translations, 
rotations, and reflections), then (E) = (F). 
iii. u(Q) = 1, where Q is the unit cube 


Q= {zE R”:0< zr; <1forj =1,...,n}. 
19 
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Unfortunately, these: conditions are mutually inconsistent. Let us see why this is 
true for n = 1. (The argument can easily be adapted to higher dimensions.) To begin 
with, we define an equivalence relation on [0, 1) by declaring that x ~ y iff x — y 
is rational. Let N be a subset of [0, 1) that contains precisely one member of each 
equivalence class. (To find such an N, one must invoke the axiom of choice.) Next, 
let R= QN (0, 1), and for each r € R let 


Np ={x+r:2E€Nn[0,1—-r)}U{r+r—l:reNnf[l—r, 1)}. 


That is, to obtain N,, shift N to the right by r units and then shift the part that sticks 
out beyond (0, 1) one unit to the left. Then N, C [0, 1), and every x € [0, 1) belongs 
to precisely one N,.. Indeed, if y is the element of N that belongs to the equivalence 
class of xz, then x € N, wherer=a2-—yifx >yorr=x-—yt+lifz < y;on 
the other hand, if x € N, N Ns, then z — r (or x — r + 1) and z — s (or x — s + 1) 
would be distinct elements of N belonging to the same equivalence class, which is 
impossible. 
Suppose now that u : P(R) — [0, co] satisfies (i), (ii), and (iii). By (i) and (ii), 


uN) =u(NNA[0, 1- r) +Nf -r, 1)) = u(N,) 


forany r € R. Also, since R is countable and [0, 1) is the disjoint union of the N,’s, 


((0,1)) = D> wr) 


rER 


by (i) again. But a([0,1)) = 1 by (iii), and since u(N,) = (N), the sum on the 
right is either 0 (if (N) = 0) or œœ (if (N) > 0). Hence no such p can exist. 

Faced with this discouraging situation, one might consider weakening (i) so that 
additivity is required to hold only for finite sequences. This is not a very good idea, 
as we shall see: The additivity for countable sequences is what makes all the limit 
and continuity results of the theory work smoothly. Moreover, in dimensions n > 3, 
even this weak form of (i) is inconsistent with (i1) and (111). Indeed, in 1924 Banach 
and Tarski proved the following amazing result: 


Let U and V be arbitrary bounded open sets in R”, n > 3. There exist k € N 
and subsets £,..., Ep, Fi,..., Fk of R” such that 


— the &,’s are disjoint and their union is U; 
— the F;’s are disjoint and their union is V; 


— Ej is congruent to F} for j =1,...,k. 


Thus one can cut up a ball the size of a pea into a finite number of pieces and 
rearrange them to form a ball the size of the earth! Needless to say, the sets £; and F} 
are very bizarre. They cannot be visualized accurately, and their construction depends 
on the axiom of choice. But their existence clearly precludes the construction of any 
p : PUR”) — [0, co] that assigns positive, finite values to bounded open sets and 
satisfies (i) for finite sequences as well as (i1). 
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The moral of these examples is that R” contains subsets which are so strangely put 
together that it is impossible to define a geometrically reasonable notion of measure 
for them, and the remedy for the situation is to discard the requirement that u should 
be defined on all subsets of R”. Rather, we shall content ourselves with constructing 
p on a class of subsets of R” that includes all the sets one is likely to meet in practice 
unless one is deliberately searching for pathological examples. This construction 
will be carried out for n = 1 in 81.5 and for n > 1 in 82.6. 

It is worthwhile, and not much extra work, to develop the theory in much greater 
generality. The conditions (ii) and (iii) are directly related to Euclidean geometry, 
but set functions satisfying (1), called measures, arise also in a great many other 
situations. For example, in a physics problem involving mass distributions, (E) 
could represent the total mass in the region &. For another example, in probability 
theory one considers a set X that represents the possible outcomes of an experiment, 
and for E C X, p(£) is the probability that the outcome lies in Æ. We therefore 
begin by studying the theory of measures on abstract sets. 


1.2 o-ALGEBRAS 


In this section we discuss the families of sets that serve as the domains of measures. 

Let X be a nonempty set. An algebra of sets on X is a nonempty collection A 
of subsets of X that is closed under finite unions and complements; in other words, 
if E1,..., En € A, then [JF E; € A; and if E € A, then E° € A. A o-algebra is 
an algebra that is closed under countable unions, (Some authors use the terms field 
and o-field instead of algebra and o-algebra.) 

We observe that since f) j £5 = (U; £F)°, algebras (resp. o-algebras) are also 
closed under finite (resp. countable) intersections. Moreover, if A is an algebra, then 
@eAand X € A, for if E € A we have Ø = EN ES and X = EU E”. 

It is worth noting that an algebra A is a o-algebra provided that it is closed under 
countable disjoint unions. Indeed, suppose {FE }f° C A. Set 


k-1 k-1 


Pe = Fx \ [U 5] = Feo [U E;] . 


1 1 


Then the F),’s belong to A and are disjoint, and J;> E; = US? Fy. This device of 
replacing a sequence of sets by a disjoint sequence is worth remembering; it will be 
used a number of times below. 

Some examples: If X is any set, P(X) and {@,X} are o-algebras. If X is 
uncountable, then 


A= {E C X : E is countable or E* is countable } 


is a o-algebra, called the o-algebra of countable or co-countable sets. (The point 
here is that if { E; } C A, then |J? E; is countable if all Æ; are countable and is 
co-countable otherwise.) 
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It is trivial to verify that the intersection of any family of c-algebras on X is again 
a o-algebra. It follows that if € is any susbset of P(X), there is a unique smallest 
o-algebra M(E) containing €, namely, the intersection of all o-algebras containing 
E. (There is always at least one such, namely, P(X ).) M(E) is called the o-algebra 
generated by €. The following observation is often useful: 


1.1 Lemma. /f € C M(F) then M(E) c M(F). 


Proof. M(F) is a a-algebra containing €; it therefore contains M(E). E 


If X is any metric space, or more generally any topological space (see Chapter 
4), the o-algebra generated by the family of open sets in X (or, equivalently, by the 
family of closed sets in X) is called the Borel o-algebra on X and is denoted by 
Bx. Its members are called Borel sets. By thus includes open sets, closed sets, 
countable intersections of open sets, countable unions of closed sets, and so forth. 

There is a standard terminology for the levels in this hierarchy. A countable 
intersection of open sets is called a G's set; a countable union of closed sets is called 
an F, set; a countable union of Gs sets is called a G5, set; a countable intersection of 
F sets is called an Fg set; and so forth. (6 and o stand for the German Durchschnitt 
and Summe, that is, intersection and union.) 

The Borel o-algebra on R will play a fundamental role in what follows. For future 
reference we note that it can be generated in a number of different ways: 


1.2 Proposition. Br is generated by each of the following: 

a. the open intervals: €, = {(a,b):a < b}, 
the closed intervals: Ey = {{a, b| : a < b}, 
the half-open intervals: E3 = {(a,b] : a < b} or Eg = {[a,b) : a < b}, 
. the open rays: Es = {(a,œ0) : a € R} or Eg = {(—co, a): a € R}, 
the closed rays: Ez = {[a, c0) : a € R} or €g = {(—co, a] : a € R}. 


es ANNS 


Proof. The elements of €; for 7 4 3, 4are open or closed, and the elements of €3 
and €4 are Gs sets — for example, (a, b] = NZ (a,b + n™+). All of these are Borel 
sets, so by Lemma 1.1, M(€;) C Bp for all j. On the other hand, every open set in R 
is a countable union of open intervals, so by Lemma 1.1 again, BR C M(€,). That 
Br C M(E;) for j > 2 can now be established by showing that all open intervals lie 
in M(E,;) and applying Lemma 1.1. For example, (a,b) = UZ [a+ n+, b— n7t] € 
M/(E). Verification of the other cases is left to the reader (Exercise 2). E 


Let {Xa}aca be an indexed collection of nonempty sets, X = [hae 4 Xa, and 
Ta : X — Xa the coordinate maps. If Ma is a o-algebra on Xa for each a, the 
product o-algebra on X is the o-algebra generated by 


{nz (Ea) : Ea € Ma, Q E A}. 


We denote this c-algebra by Qaca Ma. (If A = {1,...,n} we also write 7] M; 
or Mı ®---@M,,.) The significance of this definition will become clearer in §2.1; 
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for the moment we give an alternative, and perhaps more intuitive, characterization 
of product o-algebras in the case of countably many factors. 


1.3 Proposition. Jf A is countable, then Qaca Ma is the o-algebra generated by 
ileeatac Es E Mar 


Proof. If Ea € Ma, then tZ' (Ea) = ge, Ep where Eg = X for 8 F a; 
on the other hand, [Jac 4 Ea = aca Ma (Ea). The result therefore follows from 
Lemma 1.1. E 


1.4 Proposition. Suppose that Ma is generated by Ea, a € A. Then Raca Ma is 
generated by Fa = {r]7! (Ea) : Ea € Eg, a € A}. If A is countable and Xa E Ea 
for alla, Qaca Ma is generated by F2 = {] [ac 4 Ea : Ea E Ea}. 


Proof. Obviously M(F1) C Qaca Ma. On the other hand, for each a, the 
collection {E C Xa : rz! (E) € M(F1)} is easily seen to be a o-algebra on Xa 
that contains Ea and hence Ma. In other words, rz! (E) € M(J;) for all E € Ma, 
a € A, and hence Qac a Ma C M(F1). The second assertion follows from the first 
as in the proof of Proposition 1.3. g 


1.5 Proposition. Let X1,..., Xn be metric spaces and let X = [J], X}, equipped 
with the product metric. Then Q; Bx, C Bx. Ifthe X;’s are separable, then 
Qi Bx, = Bx. 


Proof. By Proposition 1.4, @); Bx, is generated by the sets Ty '(U;),1<j< 
n, where U; is open in X}. Since these sets are open in X, Lemma 1.1 implies that 
Qi Bx, C Bx. Suppose now that C} is a countable dense set in X}, and let E€; be 
the collection of balls in X; with rational radius and center in C}. Then every open 
set in X; is a union of members of €; — in fact, a countable union since £; itself is 
countable. Moreover, the set of points in X whose jth coordinate is in C} for all j 
is a countable dense subset of X, and the balls of radius r in X are merely products 
of balls of radius r in the X,;’s. It follows that Bx, is generated by €; and Bx is 
generated by {] [] E; : Ej € €;}. Therefore Bx = Q] Bx, by Proposition 1.4. g 


1.6 Corollary. Brr = Q; Br. 


We conclude this section with a technical result that will be needed later. We 
define an elementary family to be a collection € of subsets of X such that 


e ØEE, 
o if E,F e €thn ENFE E, 
e if E € € then E° is a finite disjoint union of members of £. 


1.7 Proposition. Įf € is an elementary family, the collection A of finite disjoint 
unions of members of € is an algebra. 
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Proof. If A,B € E and B° = U} C; (C; € E, disjoint), then A \ B = 
A N Cj) and A U B = (A \ B) UB, where these unions are disjoint, so 
A\ Be Aand AUB € A. It now follows by induction that if A; ..., An E €, then 
LU; A; € A; indeed, by inductive hypothesis we may assume that A;,...,An—1 are 
disjoint, and then |J} A; = An U Ur (A; \ An), which is a disjoint union. To see 
that A is closed under complements, suppose Aj,...A, E E and AF, = On Bi, 


with B1 ,..., B7” disjoint members of €. Then 

(U Am) = M (U Bi) = {BPO Bi 1 < jm S Im 1S mS}, 
m=1 m=1 j=1 

which is in A. a 

Exercises 


1. A family of sets R C P(X) is called a ring if it is closed under finite unions 
and differences (i.e., if E1,..., En E R, then LU; E; € R, and if E, F € R, then 
E \ F € R). A ring that is closed under countable unions is called a o-ring. 
a. Rings (resp. o-rings) are closed under finite (resp. countable) intersections. 
b. If R is a ring (resp. o-ring), then R is an algebra (resp. o-algebra) iff X E R. 
c. If Ris ao-ring, then {E C X : E € Ror E° € R} is ac-algebra. 
d. If R is a o-ring, then {E C X : EAF €R forall F € R} is ao-algebra. 


2. Complete the proof of Proposition 1.2. 


3. Let M be an infinite o-algebra. 
a. M contains an infinite sequence of disjoint sets. 
b. card(M) > c 


4. An algebra A is a o-algebra iff A is closed under countable increasing unions 
(i.e, if {E;} C A and Fy C Ey C -+ then JS Ej € A). 


5. If M is the o-algebra generated by €, then M is the union of the o-algebras 
generated by F as F ranges over all countable subsets of €. (Hint: Show that the 
latter object is a o-algebra.) 


1.3 MEASURES 


Let X be a set equipped with a o-algebra M. A measure on M (or on (X, M), or 
simply on X if M is understood) is a function u : M — [0, oo] such that 


i. (Ø) = 0, 
ii. if {£;}$° is a sequence of disjoint sets in M, then (UT Æ) = X (Ej). 
Property (ii) is called countable additivity. It implies finite additivity: 
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ii’. if E1, ... En are disjoint sets in M, then u(U; £;) = oy (Ej), 


because one can take E&E; = Ø for 7 > n. A function yp that satisfies (i) and (ii^) but 
not necessarily (11) is called a finitely additive measure. 

If X is a set and M C P(X) is ao-algebra, (X, MM) is called a measurable space 
and the sets in M are called measurable sets. If u is a measure on (X, MM), then 
(X,M, u) is called a measure space. 

Let (X, M, u) be a measure space. Here is some standard terminology concerning 
the “size” of u. If p(X) < co (which implies that u(E) < oo for all E € M since 
uX) = w(E) + p(E*)), wis called finite. If X = US E; where E; € M and 
u(E;) < œ for all j, u is called o-finite. More generally, if E = US E; where 
E; € M and (Ej) < œ for all j, the set E is said to be o-finite for u. (It would 
be correct but more cumbersome to say that E is of o-finite measure.) If for each 
E € M with (E) = œ there exists F € M with F C E and0 < (F) < œ, pis 
called semifinite. 

Every o-finite measure is semifinite (Exercise 13), but not conversely. Most mea- 
sures that arise in parctice are o-finite, which is fortunate since non-o-finite measures 
tend to exhibit pathological behavior. The properties of non-o-finite measures will 
be explored from time to time in the exercises. 

Let us examine a few examples of measures. These examples are of a rather trivial 
nature, although the first one is of practical importance. The construction of more 
interesting examples is a task to which we shall turn in the next two sections. 


e Let X be any nonempty set, M = P(X), and f any function from X to [0, oo]. 
Then f determines a measure u on M by the formula (E) = $ zep f(z). 
(For the definition of such possibly uncountable sums, see 80.5.) The reader 
may verify that u is semifinite iff f(x) < oo for every x € X, and p is o-finite 
iff u is semifinite and {x : f(x) > 0} is countable. Two special cases are of 
particular significance: If f(x) = 1 for all x, p is called counting measure; 
and if, for some xo € X, f is defined by f (xo) = 1 and f(x) = 0 for z ¥ 2, 
u is called the point mass or Dirac measure at xo. (The same names are also 
applied to the restrictions of these measures to smaller o-algebras on X.) 


e Let X be an uncountable set, and let M be the o-algebra of countable or co- 
countable sets. The function on M defined by (E) = 0 if E is countable 
and (E) = 1 if E is co-countable is easily seen to be a measure. 


e Let X be an infinite set and M = P(X). Define (E) = 0 if E is finite, 
(E) = œ if E is infinite. Then yp is a finitely additive measure but not a 
measure. 


The basic properties of measures are summarized in the following theorem. 


1.8 Theorem. Let (X,M, u) be a measure space. 
a. (Monotonicity) If E, F € Mand E C F, then p(E) < p(F). 
b. (Subadditivity) If {E;}$° C M, then (UT Ej) < op w(E;). 
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c. (Continuity from below) If {E;} C M and E, C E2 C -+ then 
u(Up Ez) = limjsoo u(E;). 

d. (Continuity from above) Jf {E;} C M, £1 D Ez D -++ and (E1) < œ, 
then (N Ez) = limjsoo u(Ez). 


Proof. (a) If E C F, then (F) = u(E) + u(F \ E) > (E). 
(b) Let Fy = E; and Fk = Ex \( = E;) for k > 1. Then the F;,’s are disjoint 
and |J} F; = Uy Ej for all n. Therefore, by (a), 


u(UE;) = (UF) => u(F)) < > (Ej). 
1 1 1 1 
(c) Setting Eo = Ø, we have 
u(U E) = J (By \ E1) = lim Y) (Bj \ Ej-1) = Jim (En), 
1 1 1 


(d) Let F; = Ey \ Bj; then Fy C Fo C +, w(E1) = w(F;) + u(Z;), and 
UP Fy = Fi \ (OP £;)- By (©), then, 


(Er) = (C85) + jim aE) = (CYB) + tim [CE — aE»): 


Since (E1) < oo, we may subtract it from both sides to yield the desired result. g 


We remark that the condition u(E1) < oo in part (d) could be replaced by 
U(E) < co for some n > 1, as the first n — 1 E;’s can be discarded from the 
sequence without affecting the intersection. However, some finiteness assumption 
is necessary, as it can happen that u(E;) = oo for all j but u((\P> E;) < co. (For 
example, let u be counting measure on (N, P(N)) and let Ej = {n : n > 7}; then 
(Ch E; = 2.) 

If (X, M, u) is a measure space, a set E € M such that (E) = 0 is called a null 
set. By subadditivity, any countable union of null sets is a null set, a fact which we 
shall use frequently. If a statement about points x € X is true except for x in some 
null set, we say that it is true almost everywhere (abbreviated a.e.), or for almost 
every x. (If more precision is needed, we shall speak of a u-null set, or pz-almost 
everywhere). 

If u(E) = 0 and F C E, then (F) = 0 by monotonicity provided that F € M, 
but in general it need not be true that F € M. A measure whose domain includes 
all subsets of null sets is called complete. Completeness can sometimes obviate 
annoying technical points, and it can always be achieved by enlarging the domain of 
u, as follows. 


1.9 Theorem. Suppose that (X,M, p) is a measure space. Let N = {N E€ M : 
(N) = 0} and M = {EUF : E € M and F C N for some N € N}. Then M is 


a o-algebra, and there is a unique extension fi of u to a complete measure on M. 
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Proof. Since M and N are closed under countable unions, sois M. If EUF € M 
where E&E € Mand F C N EN, we can assume that EM N = Ø (otherwise, replace 
F and N by F\ E and N\ E). Then FUF = (EUN)N(NSU FP), so 
(EUF) = (EUN)°U(N\F). But (EUN) € Mand N \ F CN, so that 
(E U F)° € M. Thus M is a o-algebra. 

If EU F e M as above, we set (E U F) = p(E). This is well defined, 
since if Ei U Fi = Ez U Fo where F; C N; € N, then By C Ez U No and so 
(Ei) < (E2) + (N2) = (E2), and likewise u(E2) < (E). It is easily 
verified that 7 is a complete measure on M, and that 77 is the only measure on M that 
extends u; details are left to the reader (Exercise 6). E 


The measure 7 in Theorem 1.9 is called the completion of u, and JM is called the 
completion of M with respect to ju. 


Exercises 
6. Complete the proof of Theorem 1.9. 


7. If t1,..., fn are measures on (X,M) and ay,...,a,, € [0, 00), then XO] ajp; 
is a measure on (X, M). 

8. If (X,M,) is a measure space and {E;}9° C M, then p(lim inf E;) 
lim inf u(E£;). Also, w(limsup £;) > lim sup u(£;) provided that u(U> E;) 
OO. 


9. If(X,M, uw) isameasure space and E, F € M, then y(E)+u(F) = u(E U F)+ 
(EN F). 


10. Given a measure space (X, M, p) and E € M, define pg (A) = (AN E) for 
A € M. Then pg is a measure. 


= 
< 


11. A finitely additive measure u is a measure iff it is continuous from below as in 
Theorem 1.8c. If u(X) < œ, u is a measure iff it is continuous from above as in 
Theorem 1.8d. 


12. Let (.X,M, 2) be a finite measure space. 
a. If E, F € Mand (EAF) = 0, then (E) = (F). 
b. Say that FE ~ F if u( EA F) = 0; then ~ is an equivalence relation on M. 
c. For E, F € M, define p(E, F) = (EAF). Then o(E,G) < p(E, F) + 
p(F, G), and hence p defines a metric on the space M/ ~ of equivalence classes. 


13. Every o-finite measure is semifinite. 


14. If pis a semifinite measure and (E) = oo, for any C > 0 there exists F C E 
with C < p(F) < œ. 


15. Given a measure u on (X, M), define uo on M by uo(E) = sup{u( F): F C 
E and (F) < oo}. 

a. uo is a semifinite measure. It is called the semifinite part of u. 

b. If 42 is semifinite, then y = po. (Use Exercise 14.) 





28 MEASURES 


c. There is a measure v on M (in general, not unique) which assumes only the 
values 0 and oo such that u = uo + v. 


16. Let (X, M, u) be a measure space. A set E C X is called locally measurable 
if EM A € M for all A € M such that p(A) < oo. Let M be the collection of all 
locally measurable sets. Clearly M C M; if M = M, then u is called saturated. 

a. If u is o-finite, then p is saturated. 

b. Misa o-algebra. 

c. Define ji on M by (E) = p(E) if E € M and (E) = œ otherwise. Then 

u is a Saturated measure on M, called the saturation of u. 

d. If is complete, so is p. 

e. Suppose that u is semifinite. For Æ € M, define u(E) = sup{p(A) : A € 


M and A C E}. Then p is a saturated measure on M that extends p. 

f. Let X1, X2 be disjoint uncountable sets, X = X1 U X2, and M the o-algebra 
of countable or co-countable sets in X. Let po be counting measure on P( X1), 
and define y on M by u(E) = uo(E N X,). Then p is a measure on M, 
M = P(X), and in the notation of parts (c) and (e), ø # p. 


1.4 OUTER MEASURES 


In this section we develop the tools we shall use to construct measures. To motivate 
the ideas, it may be useful to recall the procedure used in calculus to define the area 
of a bounded region F in the plane R*. One draws a grid of rectangles in the plane 
and approximates the area of E from below by the sum of the areas of the rectangles 
in the grid that are subsets of E, and from above by the sum of the areas of the 
rectangles in the grid that intersect &. The limits of these approximations as the grid 
is taken finer and finer give the “inner area” and “outer area” of E, and if they are 
equal, their common value is the “area” of &. (We shall discuss these matters in more 
detail in 82.6.) The key idea here is that of outer area, since if R is a large rectangle 
containing &, the inner area of E is just the area of R minus the outer area of R \ E. 
The abstract generalization of the notion of outer area is as follows. An outer 
measure on a nonempty set X is a function u* : P(X) — (0, oo] that satisfies 


o u*(A) < p*(B)if ACB, 


o u“ (UX As) < SP u“ (Aj). 


The most common way to obtain outer measures is to start with a family € of 
“elementary sets” on which a notion of measure is defined (such as rectangles in the 
plane) and then to approximate arbitrary sets “from the outside” by countable unions 
of members of €. The precise construction is as follows. 
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1.10 Proposition. Let € C P(X) and p : E — (0, oo] be such that Ø E E, X € €, 
and p(@) = 0. For any A C X, define 


u* (A) = int u(E;): Ej E Eand AC Uzi}. 
1 1 


Then u* is an outer measure. 


Proof. For any A C X there exists {Ej} C € such that A c (JY E; (take 
E; = X for all 7) so the definition of u* makes sense. Obviously u* (Ø) = 0 (take 
E; = Ø for all j), and u*(A) < p*(B) for A C B because the set over which 
the infimum is taken in the definition of ~*(A) includes the corresponding set in the 
definition of u* (B). To prove the countable subadditivity, suppose {A,;}7° C P(X) 
and € > 0. For each j there exists {EF}; C E such that A; C UZ, EF and 
Drai P(EF) < w*(A;) +627. But then if A = UY Aj, we have A C U54_, EF 
and S; P(EF) < X; w*(As) + € whence u*(A) < >), u*(A;) + €. Since € is 
arbitrary, we are done. E 

The fundamental step that leads from outer measures to measures is as follows. If 
u* is an outer measure on X, a set A C X is called u*-measurable if 


uw (E) =p (EN A) + (EN AS) forall Ec xX. 


Of course, the inequality u*(E) < u*(ENA)+pu*(EN A‘) holds for any A and F, 
so to prove that A is z*-measurable, it suffices to prove the reverse inequality. The 
latter is trivial if .*(.) = oo, so we see that A is u*-measurable iff 


u*(E) > w(EN A) + p*(EN A) for all E C X such that p*(E) < œ. 


Some motivation for the notion of *-measurability can be obtained by referring 
to the discussion at the beginning of this section. If E is a “well-behaved” set such 
that Æ D A, the equation u*(E) = u*(E N A) + p*(E N A°) says that the outer 
measure of A, u*(A), is equal to the “inner measure” of A, u* (E) — u*(E N AS). 
The leap from “well-behaved” sets containing A to arbitrary subsets of X a large 
one, but it is justified by the following theorem. 


1.11 Carathéodory’s Theorem. Zf * is an outer measure on X, the collection M 
of .*-measurable sets is a o-algebra, and the restriction of u* to M is a complete 
measure. 


Proof. First, we observe that M is closed under complements since the definition 
of 2*-measurability of A is symmetric in A and A‘. Next, if A,B € Mand EC X, 


p(B) = p(B A) + u“ (EN AS) 
=L(ENANB)+y* (EN ANB) +p*(ENASNB) +p (EN ASN BY’). 


But (AU B) = (AN B) U (AN B°) U(AS N B), so by subadditivity, 
W(ENANB)+yW(ENANB)+p(ENASNB) > w(EN(AUB)), 
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and hence 
H*(E) = we (BN(AUB)) + u*(EN (AUB)’). 


It follows that AU B € M, so M is an algebra. Moreover, if A,B € M and 
ANB=Ø, 
u*(AU B) = p*((AU B) N A) + u*((A4U B)N A°) = pA) + p(B), 


so u* is finitely additive on M. 

To show that M is a o-algebra, it will suffice to show that M is closed under 
countable disjoint unions. If {A;}f° is a sequence of disjoint sets in M, let B, = 
U; A; and B = LU) A;. Then for any E C X, 


u*(E N Bn) = w(ENB,N An) + w*(EN BaN AS) 
= (EN An) + p* (EN Bn-1), 


so a simple induction shows that u* (E N Bn) = ><) u* (E A A;). Therefore, 
p(B) = p*(E N Ba) + u*(E N BS) > X. u*(ENn Aj) + p*(EN B’), 
1 


and letting n — oo we obtain 


OO 


H*(E) > JO ut (EN A5) +p (E A B°) > p (UEN Ay) +e (E n BY) 
= p*(E N B) + u*(E A B°) > p(B). 


All the inequalities in this last calculation are thus equalities. It follows that B € M 
and — taking E = B — that u*(B) = XOT p*(A;), so u* is countably additive on 
M. Finally, if u*(A) = 0, for any E C X we have 


H*(E) < p* (EN A) + p*(E NA°) = "(EN A’) < p*(E), 


so that A € M. Therefore u*|M is a complete measure. E 


Our first applications of Carathéodory’s theorem will be in the context of extending 
measures from algebras to o-algebras. More precisely, if A C P(X) is an algebra, a 
function Ug : A — [0, co] will be called a premeasure if 


© H0(S) = 0, 
o if {A;}f is a = ef of disjoint sets in A such that |J? A; € A, then 


bo(UP As) =) ol As). 


In particular, a premeasure is finitely additive since one can take A; = © for 7 large. 
The notions of finite and o-finite premeasures are defined just as for measures. If uo 
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is a premeasure on A C P(X), it induces an outer measure on X in accordance with 
Proposition 1.10, namely, 


(1.12) u*(B) = inff). po(A5) : Aj € A, EcUA;}. 
1 1 


1.13 Proposition. If Uo is a premeasure on A and u* is defined by (1.12), then 
a. p*|A = po; 
b. every set in A is u* measurable. 


Proof. (a) Suppose EF € A. If E c UP A; with A; € A, let Ba = EN 
(An \ J: A,;). Then the B,,’s are disjoint members of A whose union is E, so 
pol(E) = $7 wo(B;) < $7 wo(A;). It follows that yo(F) < p*(£), and the 
reverse inequality is obvious since E&E C (JY A; where Aj = E and A; = Ø for 
j>l. 

(b) If A €e A, E C X, and e > 0, there is a sequence {B;} C A with 
ECU? B; and $7 uo(B;) < p*(E) + €. Since po is additive on A, 


u*(E) +e > X` p(B; N A) +X wo(By NA°) > p* (EN A) + u*(E N A’). 
1 1 
Since € is arbitrary, A is .*-measurable. FA 


1.14 Theorem. Let A C P(X) be an algebra, po a premeasure on A, and M the 
o-algebra generated by A. There exists a measure p on M whose restriction to A is 
po — namely, p = p*|M where p* is given by (1.12). If v is another measure on M 
that extends po, then v(E) < p(E) for all E € M, with equality when p( E) < œ. 
If po is o-finite, then u is the unique extension of uo to a measure on M. 


Proof. The first assertion follows from Carathéodory’s theorem and Proposition 
1.13 since the o-algebra of .*-measurable sets includes A and hence M. As for 
the second assertion, if E&E € M and E C US A; where A; € A, then v(E) < 
yo) Y(A;) = OP wo(A;), whence v(E) < (E). Also, if we set A = US? A;, we 
have 


v(A) = tim v(U4;) = tim (U A;) = (A). 


If u(E) < œ, we can choose the A,’s so that u(A) < (E) + €, hence u(A\ E) < e€, 
and 


U(E) < pA) = (A) = v(E) + (A\ E) < v(E) + u(A\ E) < v(E) + €. 


Since € is arbitrary, u( E) = v( E). Finally, suppose X = UJ; A; with po(A;) < œ, 
where we can assume that the A,’s are disjoint. Then for any E € M, 


p(E) = Š MEN As) = X (EN Aj) = (E), 


SOV = LL. E 
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The proof of this theorem yields more than the statement. Indeed, po may be 
extended to a measure on the algebra M* of all y.*-measurable sets. The relation 
between M and M* is explored in Exercise 22 (along with Exercise 20b, which 
ensures that the outer measures induced by uo and p are the same). 


Exercises 


17. If u* is an outer measure on X and {A,}f° is a sequence of disjoint p:*- 
measurable sets, then u*(E N (UP? 4;) = SP p*(E N A;)forany E C X. 


18. Let A C P(X) be an algebra, A, the collection of countable unions of sets 
in A, and Ags the collection of countable intersections of sets in As. Let uo be a 
premeasure on A and p* the induced outer measure. 
a. For any E C X and e > 0 there exists A € A, with E C A and p*(A) < 
p*(E) +e. 
b. If u*(E) < oo, then E is u*-measurable iff there exists B € Ags with E C B 
and u*(B\ E) =0. 
c. If uo is o-finite, the restriction *( Æ) < oo in (b) is superfluous. 


19. Let u* be an outer measure on X induced from a finite premeasure po. If 
E C X, define the inner measure of E to be y,(E) = po(X) — u* (E°). Then Æ 
is 4*-measurable iff u* (E) = u«(E). (Use Exercise 18.) 


20. Let u* be an outer measure on X, M* the o-algebra of *-measurable sets, 
T = p*|M*, and y* the outer measure induced by 7 as in (1.12) (with z and M* 
replacing uo and A). 

a. If E C X, we have p*(E) < p*(£), with equality iff there exists A € M* 

with A D E and p*(A) = p* (E). 

b. If :* is induced from a premeasure, then u* = u*. (Use Exercise 18a.) 

c. If X = {0,1}, there exists an outer measure y* on X such that p* # pr. 


21. Let u* be an outer measure induced from a premeasure and f the restriction of 
u* to the .*-measurable sets. Then 7 is saturated. (Use Exercise 18.) 


22. Let (X, M, u) be a measure space, u* the outer measure induced by p according 
to (1.12), M* the o-algebra of *-measurable sets, and Z = p*|M*. 
a. If u is o-finite, then 7 is the completion of u. (Use Exercise 18.) 
b. In general, H is the saturation of the completion of u. (See Exercises 16 and 
21.) 


23. Let A be the collection of finite unions of sets of the form (a, b] N Q where 
-œ <a<b<o. 
a. A is an algebra on Q. (Use Proposition 1.7.) 
b. The o-algebra generated by A is P(Q). 
c. Define uo on A by uo(Ø) = 0 and po(A) = œo for A # Ø. Then pp is a 
premeasure on A, and there is more than one measure on P(Q) whose restriction 
to A is po. 
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24. Let u be a finite measure on (X, MM), and let u* be the outer measure induced by 
u. Suppose that E C X satisfies u*(E) = u* (X) (but not that E € M). 
a. If A,B €Mand AN E = BN E, then p(A) = (B). 
b. Le Mg = {ANE : A € M}, and define the function v on M g defined by 
v(AN E) = u(A) (which makes sense by (a)). Then Mg is a o-algebra on E 
and v is a measure on Mg. 


1.5 BOREL MEASURES ON THE REAL LINE 


We are now in a position to construct a definitive theory for measuring subsets of R 
based on the idea that the measure of an interval is its length. We begin with a more 
general (but only slightly more complicated) construction that yields a large family 
of measures on R whose domain is the Borel o-algebra Bg; such measures are called 
Borel measures on R. 

To motivate the ideas, suppose that yz is a finite Borel measure on R, and let 
F(x) = p((-0o0,2]). (F is sometimes called the distribution function of p.) 
Then F is increasing by Theorem 1.8a and right continuous by Theorem 1.8d since 
(—oo, z] = 7 (—00, £n] whenever £n N x. (Recall the discussion of increasing 
functions in 80.5.) Moreover, if b > a, (—oo, b] = (—co, a] U (a, b], so u((a, b]) = 
F(b) — F(a). Our procedure will be to turn this process around and construct a 
measure jz Starting from an increasing, right-continuous function F’. The special case 
F(x) = x will yield the usual “length” measure. 

The building blocks for our theory will be the left-open, right-closed intervals in 
IR — that is, sets of the form (a, }| or (a, oo) or Ø, where -co < a < b < œ. In 
this section we shall refer to such sets as h-intervals (h for “half-open’’). Clearly the 
intersection of two h-intervals is an h-interval, and the complement of an h-interval 
is an h-interval or the disjoint union of two h-intervals. By Proposition 1.7, the 
collection A of finite disjoint unions of h-intervals is an algebra, and by Proposition 
1.2, the o-algebra generated by A is Br. 


1.15 Proposition. Let F : R — R be increasing and right continuous. If (aj, b;] 
(j =1,...,n) are disjoint h-intervals, let 


Ho (Ulas, bs) = SIF) - F(a;)], 


and let tg (@) = 0. Then uo is a premeasure on the algebra A. 


Proof. First we must check that uo is well defined, since elements of A can be 
represented in more than one way as disjoint unions of h-intervals. If {(a;,b;]}? 
are disjoint and |J} (aj, bj] = (a,b], then, after perhaps relabeling the index j, we 
must have a = a, < bı = ag < bə =... < bn = b, SO Do [F (b;) — F(a;)] = 
F(b) — F(a). More generally, if {I;}} and {Jj}? are finite sequences of disjoint 
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h-intervals such that [JF J; = Uj’ J}, this reasoning shows that 
Y= voli) = X wo (Li Jz) = X po(Jj) 
i ij j 


Thus xo is well defined, and it is finitely additive by construction. 

It remains to show that if {I} }9° is a sequence of disjoint h-intervals with U}° I; € 
A then po(UP° L) = oP uo(LĻ;). Since UP? J; is a finite union of h-intervals, the 
sequence {I;}$° can be partitioned into finitely many subsequences such that the 
union of the intervals in each subsequence is a single h-interval. By considering each 
subsequence separately and using the finite additivity of uo, we may assume that 
U? J, is an h-interval I = (a, b}. In this case, we have 


Ho(I) = po (Us) + wo(I\ J) 2 wo(U5) = vu 


Letting n — œœ, we obtain uo(I) > $7 a(I;). To prove the reverse inequality, 
let us suppose first that a and b are finite, and let us fix € > 0. Since F is right 
continuous, there exists 6 > 0 such that F(a + 6) — F(a) < e€, and if I; = (aj, bj], 
for each j there exists 6; > 0 such that F'(b; + 6;) — F(bj) < €271. The open 
intervals (aj, bj + 6;) cover the compact set [a + 6, b], so there is a finite subcover. 
By discarding any (a;, bj + 6;) that is contained in a larger one and relabeling the 
index j, we may assume that 


e the intervals (a1, bı + 61),...,(anw, bn + ôn) cover {a + ô, b], 
© 0; +6; -€ (aj44; bjart 6349) for 7 = pe N —1. 
But then 
polI) < F(b) — F(a+ 6) +e 
< F(by + 6n) — F(ai) +€ 


N-1 

= F(by + ôn) — F(an) + X [F(aj+1) — F(a;)] +€ 
1 
N-1 

< F(by + 6y) — Flan) + X [F(b; +8;) — F(a;)] +e 
1 


Since € is arbitrary, we are done when a and b are finite. If a = —oo, for any 
M < œ the intervals (a; b; + 6;) cover [-M, b], so the same reasoning gives 
F(b) — F(-M) < X7 wi j) + 2e, whereas if b = œ, for any M < œ we 
likewise obtain F(M) — F(a) 2 Yo) Ho(I;) + 2e. The desired result then follows 
by letting € — 0 and M —> oo. E 
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1.16 Theorem. Jf F : R — R is any increasing, right continuous function, there is 
a unique Borel measure pr on R such that pr((a, b]) = F(b) — F(a) for all a,b. If 
G is another such function, we have ur = Uc iff F — G is constant. Conversely, if 
u is a Borel measure on R that is finite on all bounded Borel sets and we define 


u((0, z]) ifr > 0, 
F(x) =< 0 ifx = 0, 


—p((—2, 0) if x <0, 
then F is increasing and right continuous, and p = pp. 


Proof. Each F induces a premeasure on A by Proposition 1.15. Itis clear that F 
and G induce the same premeasure iff F — G is constant, and that these premeasures 
are o-finite (since R = UJ (j, j+1]). The first two assertions therefore follow from 
Theorem 1.14. As for the last one, the monotonicity of implies the monotonicity 
of F, and the continuity of u from above and below implies the right continuity of F 
for x > 0 and z < 0. It is evident that u = wr on A, and hence u = up on Bp by 
the uniqueness in Theorem 1.14. ri 


Several remarks are in order. First, this theory could equally well be developed 
by using intervals of the form [a,b) and left continuous functions F. Second, if 
p is a finite Borel measure on R, then y = up where F(x) = p((—co, x]) is the 
cumulative distribution function of p; this differs from the F specified in Theorem 
1.16 by the constant jz((—oo, 0]). Third, the theory of §1.4 gives, for each increasing 
and right continuous F’, not only the Borel measure pp but a complete measure H p 
whose domain includes Bg. In fact, Æp is just the completion of wr (Exercise 22a 
or Theorem 1.19 below), and one can show that its domain is always strictly larger 
than Bg. We shall usually denote this complete measure also by pF; it is called the 
Lebesgue-Stieltjes measure associated to F. 

Lebesgue-Stieltjes measures enjoy some useful regularity properties that we now 
investigate. In this discussion we fix a complete Lebesgue-Stieltjes measure on R 
associated to the increasing, right continuous function F, and we denote by M, the 
domain of u. Thus, for any E € M,, 


CoO 


(E) = int { S>[F (by) SFG) 28 G Ulas, b;]} 
= inff Y u(laj, bil) : E c Ulas, b] }. 
1 1 


We first observe that in the second formula for (E) we can replace h-intervals by 
open h-intervals: 


1.17 Lemma. For any E € M,, 


p(B) = inf Y p((aj,b;)) : BC (Kat): 


1 





36 MEASURES 


Proof. Let us call the quantity on the right v(E). Suppose E c U (aj, b;). 
Each (a,;,0;) is a countable disjoint union of h-intervals If (k = 1,2,...); specifi- 


cally, I = ( cr Ce where {c; } is any sequence such har cj = =a, and c} increases 
oO k 
to b; as k — oo. Thus E C U; k=1 I7.» so 


OO OO 


S p((aj,b;)) 


1 jk=1 


I 
= 
oe 
IV 
E 
= 


and hence ig > has On a other hand, given € > 0 there exists {(a;,b;]}9° 
with E c UF (az, bj] and XY u((a;,bj]) < (E) + €, and for each j there exists 
6; > 0 such that F(b, + 6;) — FO, ) < €27. Then E C UP? (aj, bj + 6;) and 


X a((aj, bj +8)) < X` u(laj,bj]) +€ < u(E) + 2e, 
1 1 
so that v( E) < u(E). E 


1.18 Theorem. If E € M,, then 


(E) = inf{u(U) : U D E and U is open} 
= sup{u(K) : K C E and K is compact}. 


Proof. By Lemma 1.17, for any € > 0 there exist intervals (a;,b;) such that 
E c UP (aj, bj) and (E) < SSP u((a;j,bj)) +e. If U = UF (aj, bj) then U is 
open, U D E, and p(U) < u(E) +€. On the other hand, (U) > WE) whenever 
U > E, so the first equality is valid. For the second one, suppose first that F is 
bounded. If Æ is closed, then Æ is compact and the equality is obvious. Otherwise, 
given € > 0 we can choose an open U > E \ E such that (U) < (E \ E) + e€. Let 
K = E \ U. Then K is compact, K C E, and 


w(K) = u(E) — (EN U) = (E) — [a(U) - ‘aie 
> u(E) — (U) + WE \ E) > (E) - 


If E is unbounded, let E; = E N (j, j + 1]. By the preceding argument, for 
any € > 0 there exist compact K; C E; with w(K;) > p(E;) — «277. Let 
= |J? „ Kj. Then Hn is compact, Hn C E, and (Hn) > p(U",, Ej) — 
Since (E) = limno u(U",, E;), the result follows. E 


1.19 Theorem. /f E C R, the following are equivalent. 
a EEM, 
b. E = V \ N; where V is a Gs set and (N1) = 0. 
c. E = H U No where H is an F, set and (N2) = 
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Proof. Obviously (b) and (c) each imply (a) since jz is complete on M,,. Suppose 
E € M, and (E) < oo. By Theorem 1.18, for j € N we can choose an open 
U; > E and a compact K; C E such that 


p(U;) — 2-9 < WE) < Ki) +279, 


Let V = N7 U; and H = UY K;. Then H C E C V and (V) = (H) = 
(E) < œ, so w(V \ E) = (E \ H) = 0. The result is thus proved when 
(E) < œ; the extension to the general case is left to the reader (Exercise 25). g 


The significance of Theorem 1.19 is that all Borel sets (or, more generally, all sets 
in M,,) are of a reasonably simple form modulo sets of measure zero. This contrasts 
markedly with the machinations necessary to construct the Borel sets from the open 
sets when null sets are not excepted; see Proposition 1.23 below. Another version 
of the idea that general measurable sets can be approximated by “simple” sets is 
contained in the following proposition, whose proof is left to the reader (Exercise 
26): 


1.20 Proposition. Jf E € M, and u(E) < œ, then for every e > 0 there isa set A 
that is a finite union of open intervals such that (EAA) < €. 


We now examine the most important measure on R, namely, Lebesgue measure: 
This is the complete measure up associated to the function F(x) = x, for which the 
measure of an interval is simply its length. We shall denote it by m. The domain of 
m is called the class of Lebesgue measurable sets, and we shall denote it by £. We 
shall also refer to the restriction of m to Bp as Lebesgue measure. 

Among the most significant properties of Lebesgue measure are its invariance 
under translations and simple behavior under dilations. If Æ C R and s,r € R, we 
define 

E+s={r+s:26€E}, rE = {re:2€ E}. 


1.21 Theorem. If E € L, then E +s €LandrE E£ forall s,r € R. Moreover, 
m(E + s) = m(E) and m(r E) = |r\m(£). 


Proof. Since the collection of open intervals is invariant under translations and 
dilations, the same is true of Bg. For E € Br, let m.(E) = m(E + s) and 
m”(E) = m(rE). Then m, and m” clearly agree with m and |r|m on finite unions 
of intervals, hence on Bg by Theorem 1.14. In particular, if E € Br and m( E) = 0, 
then m(E + s) = m(rE) = 0, from which it follows that the class of sets of 
Lebesgue measure zero is preserved by translations and dilations. It follows that £ 
(the members of which are a union of a Borel set and a Lebesgue null set) is preserved 
by translation and dilations and that m(E + s) = m(E) and m(r E) = |r|m(£) for 
al HEL. E 


The relation between the measure-theoretic and topological properties of subsets 
of R is delicate and contains some surprises. Consider the following facts. Every 
singleton set in R has Lebesgue measure zero, and hence so does every countable 
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set. In particular, m(Q) = 0. Let {r;}9° be an enumeration of the rational numbers 
in (0, 1], and given € > 0, let I; be the interval centered at r; of length «2-7. Then 
the set U = (0,1) n UP J; is open and dense in (0, 1], but m(U) < Soy €271 =e; 
its complement K = [0,1] \ U is closed and nowhere dense, but m(K) > 1 — e. 
Thus a set that is open and dense, and hence topologically “large,” can be measure- 
theoretically small, and a set that is nowhere dense, and hence topologically “small,” 
can be measure-theoretically large. (A nonempty open set cannot have Lebesgue 
measure zero, however.) 

The Lebesgue null sets include not only all countable sets but many sets having 
the cardinality of the continuum. We now present the standard example, the Cantor 
set, which is also of interest for other reasons. 

Each x € (0, 1] has a base-3 decimal expansion x = }`7 a;37J where a; = 0, 1, 
or 2. This expansion is unique unless z is of the form »3~* for some integers p, k, in 
which case x has two expansions: one witha; = O for j > k and one witha; = 2 for 
j > k. Assuming p is not divisible by 3, one of these expansions will have a, = 1 
and the other will have a, = 0 or 2. If we agree always to use the latter expansion, 
we see that 

a, =1 iff} <2 < 2, 
a; A landaz =1 iff} <r¢<4Zori<2r< 8, 


and so forth. It will also be useful to observe that if x = $` a;379 and y = Y b;3 75, 
then x < y iff there exists an n such that a,, = bn and a; = b; for j < n. 

The Cantor set C is the set of all x € [0,1] that have a base-3 expansion 
x = )\aj;3~/ with a; # 1 for all j. Thus C is obtained from [0,1] by removing the 
open middle third (4, 2), then removing the open middle thirds (4, 2) and ($, 5) of 
the two remaining intervals, and so forth. The basic properties of C are summarized 
as follows: 


1.22 Proposition. Let C be the Cantor set. 


a. C is compact, nowhere dense, and totally disconnected (i.e., the only connected 
subsets of C are single points). Moreover, C has no isolated points. 

b. m(C) = 0. 

c. card(C') =. 


Proof. We leave the proof of (a) to the reader (Exercise 27). As for (b), C is 
obtained from [0, 1] by removing one interval of length L, two intervals of length 4, 
and so forth. Thus 


Lastly, suppose x € C, so that x = X> aj3-4 where a; = 0 or 2 for all j. 
Let f(x) = X7 b;2-7 where bj = a;/2. The series defining f(x) is the base-2 
expansion of a number in [0,1], and any number in [0,1] can be obtained in this way. 
Hence f maps C onto [0,1], and (c) follows. E 
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Let us examine the map f in the preceding proof more closely. One readily sees 
that if x,y E€ C and z < y, then f(x) < f(y) unless x and y are the two endpoints 
of one of the intervals removed from [0,1] to obtain C. In this case f(x) = p2~* for 
some integers p, k, and f(x) and f(y) are the two base-2 expansions of this number. 
We can therefore extend f to a map from [0,1] to itself by declaring it to be constant 
on each interval missing from C. This extended f is still increasing, and since its 
range is all of [0,1] it cannot have any jump discontinuities; hence it is continuous. 
f is called the Cantor function or Cantor-Lebesgue function. 

The construction of the Cantor set by starting with [0,1] and successively removing 
open middle thirds of intervals has an obvious generalization. If J is a bounded 
interval and a € (0,1), let us call the open interval with the same midpoint as J and 
length equal to a times the length of J the “open middle ath” of I. If {a,;}f° is 
any sequence of numbers in (0, 1), then, we can define a decreasing sequence { K;; } 
of closed sets as follows: Ko = [0,1], and K; is obtained by removing the open 
middle a;th from each of the intervals that make up K;_;. The resulting limiting 
set K = NI K; is called a generalized Cantor set. Generalized Cantor sets all 
share with the ordinary Cantor set the properties (a) and (c) in Proposition 1.22. As 
for their Lebesgue measure, clearly m(K;) = (1 — a;)m(K;_1), so m(K) is the 
infinite product [ [7 (1—a,;) = limp 400 []} (1—a;). If the a; are all equal to a fixed 
a € (0,1) (for example, a = ł for the ordinary Cantor set), we have m( K) = 0. 
However, if a; — 0 sufficiently rapidly as 7 — oo, m( K) will be positive, and for 
any 8 € (0,1) one can choose a; so that m(K) will equal 8; see Exercise 32. This 
gives another way of constructing nowhere dense sets of positive measure. 

Not every Lebesgue measurable set is a Borel set. One can display examples of 
sets in L \ Br by using the Cantor function; see Exercise 9 in Chapter 2. Alternatively, 
one can observe that since every subset of the Cantor set is Lebesgue measurable, we 
have card(£) = card(P(R)) > c, whereas card(Br) = c. The latter fact follows 
from Proposition 1.23 below. 


Exercises 
25. Complete the proof of Theorem 1.19. 
26. Prove Proposition 1.20. (Use Theorem 1.18.) 


27. Prove Proposition 1.22a. (Show that if x,y € C and x < y, there exists z € C 
such that x < z < y.) 


28. Let F be increasing and right continuous, and let up be the associated measure. 
Then up({a}) = F(a) — F(a-), ur([a,b)) = F(b-) — F(a-), prlla, 8) = 
F(b) — F(a—), and ur ((a,b)) = F(b—) — F (a). 


29. Let E be a Lebesgue measurable set. 
a. If E C N where N is the nonmeasurable set described in §1.1, then m( E) = 
0. 
b. If m(E) > 0, then E contains a nonmeasurable set. (It suffices to assume 
E C [0,1]. In the notation of §1.1, E = Uer EN Nr.) 
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30. If E € L and m(E) > 0, for any a < 1 there is an open interval J such that 
mENI)>am(I). 


31. If E € Land m(E) > 0, the set E — E = {x — y : x,y E€ E} contains an 
interval centered at 0. (If J is as in Exercise 30 with a > 3, then & — E contains 


(im(I), ¢m(J)).) 

32. Suppose {a; }F° C (0,1). 
a. [[; (1—;) > 0 iff $7 a; < oœ. (Compare X7 log(1 — a;) to $- aj.) 
b. Given 2 € (0, 1), exhibit a sequence {a; } such that [ [F (1 — a;) = £. 


33. There exists a Borel set A C [0,1] such that 0 < m(A N I) < m(I) for every 
subinterval J of [0, 1]. (Hint: Every subinterval of [0, 1] contains Cantor-type sets of 
positive measure.) 


1.6 NOTES AND REFERENCES 


The history of measure theory is intimately connected with the history of integration 
theory, comments on which will be made in §2.7. 


$1.1: The Banach-Tarski paradox appeared first in [11], but the following variant 
goes back to Hausdorff [68]: 


The unit sphere in R3, {x € R? : |x| = 1}, is the disjoint union of four sets 
Eı,..., E4 such that (a) Ei is countable and (b) the sets Lo, E3, E4, and 
E3 U Ez are all images of each other under rotations. 


An elementary exposition of the Banach-Tarski paradox and Hausdorff’s result can 
be found in Stromberg [146]. 


§1.2: Our characterization of the o-algebra M(E) generated by a family € C 
P(X) is nonconstructive, and one might ask how to obtain M(E) explicitly from £. 
The answer is rather complicated. One can begin as follows: Let €; = EU {E° : 
E € €}, and for j > 1 define Ej to be the collection of all sets that are countable 
unions of sets in €;_1 or complements of such. Let €,, = UP? Ej: is Eu = M(E)? 
In general, no. €,, is closed under complements, but if E; E€ €; \ €;-1 for each j, 
there is no reason for |]? E; to be in Ew. So one must start all over again. More 
precisely, one must define Ea for every countable ordinal a by transfinite induction: 
If œ has an immediate predecesor B, Ea is the collection of sets that are countable 
unions of sets in Eg or complements of such; otherwise, Ea = Ugeg Eg. Then: 


1.23 Proposition. M(E) = aeg Ea, where Q is the set of countable ordinals. 


Proof. Transfinite induction shows that Eg C M(E) for all a € Q, and hence 
Uaen Ea C M(E). The reverse inclusion follows from the fact that any sequence in 
Q has a supremum in Q (Proposition 0.19): If E; E€ Ea, for j € Nand 2 = sup{a;}, 
then E; € Êa for all j and hence |J? E; € Eg where £ is the successor of a. a 
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Combining this with Proposition 0.14, we see that if card(N) < card(€) < c, 
then card(M(E)) = c. (Cf. Exercise 3.) 


81.3: Some authors prefer to take the domains of measures to be o-rings rather 
than o-algebras (see Exercise 1). The reason is that in dealing with “very large” 
Spaces one can avoid certain pathologies by not attempting to measure “very large” 
sets. However, this point of view also has technical disadvantages, and it is no longer 
much in favor. 


81.4: Carathéodory’s theorem appears in his treatise [22]. Theorem 1.14 has 
been attributed in the literature to Hahn, Carathéodory, and E. Hopf, but it is orig- 
inally due to Fréchet [54]. The proof via Carathéodory’s theorem was discovered 
independently by Hahn [60] and Kolmogorov [85]. 

See König [86] for a deeper study of the problem of constructing measures from 
more primitive data. 


§1.5: Lebesgue originally defined the outer measure m*(E) of a set E C R in 
terms of countable coverings by intervals, as we have done. He then defined a bounded 
set E to be measurable if m* (E) + m*((a, b) \ E) = b—a, where (a,b) is an interval 
containing /, and an unbounded set to be measurable if its intersection with any 
bounded interval is measurable. Carathéodory’s characterization of measurability, 
which is technically eaiser to work with, came later. For the equivalence of the two 
definitions, see Exercise 19. 

One should convince oneself that the remarkably fussy proof of Proposition 1.15 
is necessary by contemplating the complicated ways in which an h-interval can be 
decomposed into a disjoint union of h-subintervals. In any such decomposition the 
collection of right endpoints of the subintervals, when ordered from right to left, is 
a well ordered set, but it can be order isomorphic to any initial segment of the set of 
countable ordinals. 

Lebesgue measure can be extended to a translation-invariant measure on o- 
algebras that properly include £; see Kakutani and Oxtoby [81]. Of course, such 
o-algebras can never contain the nonmeasurable set discussed in §1. However, 
Lebesgue measure can be extended to a translation-invariant finitely additive mea- 
sure on P(R), and its 2-dimensional analogue (see §2.6) can be extended to a finitely 
additive measure on P(R?) that is invariant under translations and rotations; see 
Banach [8]. The Banach-Tarski paradox prevents this result from being extended to 
higher dimensions. 

In connection with the existence of nonmeasurable sets, Solovay [138] has proved 
a remarkable theorem which says in effect that it is impossible to prove the existence 
of Lebesgue nonmeasurable sets without using the axiom of choice. (The precise 
statement of the theorem involves to technical points of axiomatic set theory, which 
we shall not discuss here.) From the point of view of the working analyst, the effect of 
Solovay’s theorem is to reaffirm the adequacy of the Lebesgue theory for all practical 
purposes. 

See Rudin [124] for a terse solution of Exercise 33. 





Integration 


In the classical theory of integration on R, F f(x) dz is defined as a limit of Rie- 
mann sums, which are integrals of functions that approximate f and are constant on 
subintervals of [a,b]. Similarly, on any measure space there is an obvious notion 
of integral for functions that are, in a suitable sense, locally constant, and it can be 
extended to an integral for more general functions. In this chapter, we develop the 
theory of integration on abstract measure spaces, paying particular attention to the 
Lebesgue integral on R and its generalization to R”. 


2.1 MEASURABLE FUNCTIONS 


We begin our study of integration theory with a discussion of measurable mappings, 
which are the morphisms in the category of measurable spaces. 

We recall that any mapping f : X — Y between two sets induces a mapping 
f~t : P(Y) — P(X), defined by f-'(E) = {x e X : f(x) € E}, which 
preserves unions, intersections, and complements. Thus, if N is a c-algebra on Y, 
{f-\(E) : E € N} is a o-algebra on X. If (X,M) and (Y, XN) are measurable 
spaces, a mapping f : X — Y is called (M, N)-measurable, or just measurable 
when M and N are understood, if f~!(E )€ Mforall FEN. 

It is obvious that the composition of measurable mappings is measurable; that is, 
if f : X — Y is (M,N)-measurable and g : Y — Z is (N, O)-measurable, then 
g o f is (M, O)-measurable. 


2.1 Proposition. If N is generated by €, then f : X — Y is (M, N)-measurable iff 


fH(E)EM forall E € £. 
43 
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Proof. The “only if” implication is trivial. For the converse, observe that {E C 
Y : f-!(E) € M} is ao-algebra that contains €; it therefore contains N. E 


2.2 Corollary. If X and Y are metric (or topological) spaces, every continuous 
f: X — Y is (Bx, By )-measurable. 


Proof. f is continuous iff f™+(U) is open in X for every open U C Y. E 


If (X, M) is a measurable space, a real- or complex-valued function f on X will be 
called M-measurable, or just measurable, if it is (M, Br) or (M, Bc) measurable. 
Br or Bc is always understood as the o-algebra on the range space unless otherwise 
specified. In particular, f : R — C is Lebesgue (resp. Borel) measurable if it is 
(L, Bc) (resp. (Br, Bc)) measurable; likewise for f : R > R. 

Warning: If f,g : R — R are Lebesgue measurable, it does not follow that f o g 
is Lebesgue measurable, even if g is assumed continuous. (If & € Br we have 
f(E) € L, but unless f~1(E) € Bp there is no guarantee that g~1(f-+(F)) will 
be in Ò. See Exercise 9.) However, if f is Borel measurable, then f o g is Lebesgue 
or Borel measurable whenever g is. 


2.3 Proposition. Jf (X,M) is a measurable space and f : X — R, the following 
are equivalent: 


f is M-measurable. 

f-'((a,o)) € MforallacR 
f—*([a,00)) E€ M for all a € R. 
f—'((—00, a)) € Mforalla € R. 
f-*((-co, a]) € M forall a € R. 


D AN SA 


Proof. This follows from Propositions 1.2 and 2.1. E 


Sometimes we wish to consider measurability on subsets of X. If (X,M) is a 
measurable space, f is a function on X, and E € M, we say that f is measurable on 
E if f —1(B) N E € M for all Borel sets B. (Equivalently, f| E is M g-measurable, 
where Mg = {FNE: F eM} 

Given a set X, if {(Ya,Na)}aea is a family of measurable spaces, and f : 
X — Ya is a map for each a € A, there is a unique smallest c-algebra on X with 
respect to which the f,’s are all measurable, namely, the c-algebra generated by the 
sets f (Ea) with Ea € Na and a € A. Itis called the c-algebra generated by 
{fa}aca. In particular, if X = Ilace A Ya, we See that the product o-algebra on X, 
as defined in §1.2, is the c-algebra generated by the coordinate maps Ta : X — Ya. 


2.4 Proposition. Let (X,M) and (Yx, Na) (a E€ A) be measurable spaces, Y = 
Dex Ya, N= Qaca Na and Ta : Y — Ya the coordinate maps. Then f : X > 
Y is (M, N)-measurable iff fa = Ta © f is (M, No,)-measurable for all a. 


Proof. If f is measurable, so is each fo since the composition of measurable 
maps is measurable. Conversely, if each fa is measurable, then for all Ea E Na, 
fi (az (Ea)) = fa (Ea) E€ M, whence f is measurable by Proposition 2.1. g 


Q 
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2.5 Corollary. A function f : X — C is M-measurable iff Re f and Im f are 
M-measurable. 


Proof. This follows since Bc = Bre = Br & Br by Proposition 1.5. E 


It is sometimes convenient to consider functions with values in the extended real 
number system R = [00,00]. We define Borel sets in R by Bg = {E CR: 
EMR € Br}. (This coincides with the usual definition of the Borel o-algebra 
if we make R into a metric space with metric p(x, y) = |A(x) — A(y)|, where 
A(x) = arctan z.) It is easily verified as in Proposition 2.3 that Bg is generated by 
the rays (a, co] or [—00, a) (a € R), and we define f : X — R to be M-measurable 
if it is (M, By)-measurable. See Exercise 1. 

We now establish that measurability is preserved under the familiar algebraic and 
limiting operations. 





2.6 Proposition. If f,g : X — C are M-measurable, then so are f + g and fg. 


Proof. Define F : X — C x C, ọ : C x C — C, and y : C x C — C by 
F(x) = (f(x), g(x)), O(z,w) = z+w, y(z,w) = zw. Since Bexc = Be ® Bc by 
Proposition 1.5, F is (M, Bcxc)-measurable by Proposition 2.4, whereas ġ and w 
are (Ccxc, Bc)-measurable by Corollary 2.2. Thus f + g = ġ o F and fg = po F 
are M-measurable. E 


Proposition 2.6 remains valid for R-valued functions provided one takes a little 
care with the indeterminate expressions œo — oo and 0- co. (Recall, however, that by 
convention we always define 0 - co to be 0.) See Exercise 2. 


2.7 Proposition. If { f;} isa sequence of R-valued measurable functions on (X, M), 
then the functions 


gi(z) = sup fj(z),  gs(zx) = limsup f;(z), 
Jj 


j— œ 


go(z) = inf f;(z),  ga(x) = lim inf f;(x) 


are all measurable. If f(x) = liMmj—o f(x) exists for every x € X, then f is 
measurable. 


Proof. We have 


9; ` ((a, oo] ) = LJA (Ca, œ)), gz ([-09, a)) = LJ f7*((-00, a), 


so gı and go are measurable by Proposition 2.3. More generally, if h(x) = 
SUD; k f;(x) then hy is measurable for each k, so gg = inf, hy is measurable, 
and likewise for g4. Finally, if f exists then f = g3 = g4, so f is measurable. E 
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2.8 Corollary. If f,g : X — R are measurable, then so are max(f,g) and 
min(f, g). 


2.9 Corollary. if {f;} is a sequence of complex-valued measurable functions and 
f(x) = limjoo f;(x) exists for all x, then f is measurable. 


Proof. Apply Corollary 2.5. | 


For future reference we present two useful decompositions of functions. First, if 
f : X — R, we define the positive and negative parts of f to be 


f*(x) = max(f(z), 0), f~ (x) = max(—f(z), 0). 


Then f = ft — f`. If f is measurable, so are f* and f~, by Corollary 2.8. Second, 
if f : X — C, we have its polar decomposition: 


if z Æ 0, 


f =(senf)|f|, where sanz = {7/77 ifs =0. 


Again, if f is measurable, so are |f| and sgn f. Indeed, z +> |z| is continuous on 
C, and z +} sgn z is continuous except at the origin. If U C C is open, sgn~!(U) 
is either open or of the form V U {0} where V is open, so sgn is Borel measurable. 
Therefore |f| = | - | o f and sgn f = sgn o f are measurable. 

We now discuss the functions that are the building blocks for the theory of inte- 
gration. Suppose that (X, M) is a measurable space. If E C X, the characteristic 
function xg of E (sometimes called the indicator function of & and denoted by 


1g) is defined by 
‘3 (x)= {7 ifxe E, 
a 0 ife¢ E. 


It is easily checked that xg is measurable iff E € M. A simple function on X is a 
finite linear combination, with complex coefficients, of characteristic functions of sets 
in M. (We do not allow simple functions to assume the values too.) Equivalently, 
f: X — C is simple iff f is measurable and the range of f is a finite subset of C. 
Indeed, we have 


f= XO XE, where Ej = f~*({z;}) and range(f) = {2,..., Zn}. 
1 


We call this the standard representation of f. It exhibits f as a linear combination, 
with distinct coefficients, of characteristic functions of disjoint sets whose union is 
X. Note: One of the coefficients z; may well be 0, but the term z;jxg; is still to be 
envisioned as part of the standard representation, as the set &; may have a role to 
play when f interacts with other functions. 

It is clear that if f and g are simple functions, then so are f + g and fg. We 
now show that arbitrary measurable functions can be approximated in a nice way by 
simple functions. 
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2.10 Theorem. Let (X,M) be a measurable space. 
a. Iff: X — [0, co] is measurable, there is a sequence {¢,} of simple functions 
such that O < ¢, < ¢d2 < --- < f, én —> f pointwise, and ¢, — f uniformly 
on any set on which f is bounded. 


b. If f : X — Cis measurable, there is a sequence {¢,,} of simple functions such 
that O < |¢1| < |d2| < --- < |f|, én — f pointwise, and ¢, — f uniformly 
on any set on which f is bounded. 


Proof. (a) Forn = 0,1,2,... and0 < k < 22” — 1, let 
Bo =J (k 2" and Fa (Oe 


and define 
2271 
dn= X, kX pe + 2" xR, 
k=0 
(This formula is messy in print but easily understood graphically; see Figure 2.1.) It 
is easily checked that ¢, < ¢n+1 for all n, and O < f — n < 27” on the set where 
f <2”. The result therefore follows. 

(b) If f = g + ih, we can apply part (a) to the positive and negative parts of g 
and h, obtaining sequences Yt, p7, ¢*, C7 of nonnegative simple functions that 
increase togt,g , ht, h-. Let 6, = Yt — Y7 + i(t — C7); it is then a simple 
exercise to verify that ¢,, has the desired properties. E 


If uis a measure on (X, M), we may wish to except -null sets from consideration 
in studying measurable functions. In this respect, life is a bit simpler if u is complete. 


2.11 Proposition. The following implications are valid iff the measure u is complete: 
a. If f is measurable and f = g p-a.e., then g is measurable. 
b. If fn is measurable for n € N and fn —> f p-ae., then f is measurable. 


The proof is left to the reader (Exercise 10). 
On the other hand, the following result shows that one is unlikely to commit any 
serious blunders by forgetting to worry about completeness of the measure. 





Fig. 2.1 The functions ¢o (left) and ¢ (right) in the proof of Theorem 2.10a. 
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2.12 Proposition. Let (X,M, u) be a measure space and let (X, M,F) be its com- 
pletion. If f is an M-measurable function on X, there is an M-measurable function 
g such that f = g -almost everywhere. 


Proof. This is obvious from the definition of Z if f = xg where E € M, 
and hence if f is an M-measurable simple function. For the general case, choose 
a sequence {Øn} of -measurable simple functions that converge pointwise to f 
according to Theorem 2.10, and for each n let Yn be an M-measurable simple 
function with Yn = $n except ona set En €E M with (En) = 0. Choose N €E M 
such that w(N) = 0 and N D UY En, and set g = lim xx\nYn. Then g is 
M-measurable by Corollary 2.9, and g = f on N°. E 


Exercises 
In Exercises 1-7, (X, M) is a measurable space. 


1. Let f: X —RandY = f7!(R). Then f is measurable iff f—!({—oo}) € M, 
f~1({co}) € M, and f is measurable on Y. 


2. Suppose f,g : X — R are measurable. 
a. fg is measurable (where 0 - (too) = 0). 
b. Fix a € R and define h(x) = a if f(x) = —g(x) = +œ and h(x) = 
f(x) + g(x) otherwise. Then h is measurable. 


3. If {f,} is a sequence of measurable functions on X, then {x : lim f,,(x) exists} 
is a measurable set. 


4. If f:X — Rand f—1!((r,co]) € M for each r € Q, then f is measurable. 


5. If X = AUB where A,B € M, a function f on X is measurable iff f is 
measurable on A and on B. 


6. The supremum of an uncountable family of measurable R-valued functions on 
X can fail to be measurable (unless the o-algebra M is very special). 


7. Suppose that for each a € R we are given a set Ea E M such that Ea C Eg 
whenever a < p, Unger Ea = X, and (lacr Ha = Ø. Then there is a measurable 
function f : X — R such that f(x) < a on Ea and f(x) > a on ES for every a. 
(Use Exercise 4.) 


8. If f : R — Ris monotone, then f is Borel measurable. 


9. Let f : [0,1] — [0, 1] be the Cantor function (§1.5), and let g(x) = f(x) + z. 
a. g is a bijection from (0, 1] to [0, 2], and h = g7} is continuous from [0,2] to 
[0,1]. 

b. If C is the Cantor set, m(g(C)) = 1. 
c. By Exercise 29 of Chapter 1, g(C) contains a Lebesgue nonmeasurable set 
A. Let B = g~!(A). Then B is Lebesgue measurable but not Borel. 
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d. There exist a Lebesgue measurable function F and a continuous function G 
on R such that F' o G is not Lebesgue measurable. 


10. Prove Proposition 2.11. 


11. Suppose that f is a function on R x R* such that f(z, -) is Borel measurable 
for each x € R and f(-, y) is continuous for each y € R*. For n € N, define fn as 
follows. For i € Z let a; = i/n, and for a; < x < a;4; let 


fala, y) La flai, y)(£ i ai) = f(a, y)(x a. aj41) 
Qi+1 — Aj 
Then f,, is Borel measurable on R x R* and fa — f pointwise; hence f is Borel 


measurable on R x R*. Conclude by induction that every function on R” that is 
continuous in each variable separately is Borel measurable. 


2.2 INTEGRATION OF NONNEGATIVE FUNCTIONS 


In this section we fix a measure space (X, M, u), and we define 
L* = the space of all measurable functions from X to [0, oo]. 


If ¢ is a simple function in L+ with standard representation ¢ = `} ajXE;, we 
define the integral of ¢ with respect to u by 


Jodu = DN 


(with the convention, as always, that 0 - co = 0). We note that f œ du may equal oo. 
When there is no danger of confusion, we shall also write f ¢ for f dp. Also, it is 
sometimes convenient to display the argument of ¢ explicitly, especially when (2) is 
given by a formula in terms of x or when there are other variables involved; in this case 
we shall use the notation f (x) du(x). (Some authors prefer to write f (x) (dz) 
instead.) Finally, if A € M, then ¢x 4 is also simple (viz., x4 = > a;X ANE; ), and 
we define f, du (or f, dor f, o(x) du(z)) to be f xa du. The same notational 
conventions will also apply to the inegrals of more general functions to be defined 
below. To summarize: 


[eau= f o= | dedua) = | oxadp [=f 


2.13 Proposition. Let 6 and y be simple functions in L*. 
Ifc>0, fed=cf ¢. 

[(O+¥) = fo+ fy. 

Ifo <y, then f< fv. 


The map A |> f 4 dp is a measure on M. 


aN SR 
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Proof. (a) is trivial. For (b), let 3°) ajxg, and 5°)" bexm, be the standard 
representations of ¢ and w. Then E; = Uk (E; 9 Fe) and Fy = U7 (E; N Fx) 
since J, E; =U; Fk = X, and these unions are disjoint. Hence the finite additivity 
of u implies that 


for [v= Ea tE OF) 


and the same reasoning show that the sum on the right equals f (6+ Y). Moreover, 
if @ < y, then a; < by whenever E; N Fk # Ø, so 


fo- E onE VF) E wE nR) = fo. 
j,k j,k 
which proves (c). Finally, if {Ax } is a disjoint sequence in M and A = LJ? Ax, 
fe=-> dann By) = S| aju(Ag 9 E;) or 


jk 


which establishes (d). E 


We now extend the integral to all functions f € L* by defining 


[fans [odusosess, p simple} 


By Proposition 2.13c, the two definitions of f f agree when f is simple, as 
the family of simple functions over which the supremum is taken includes f itself. 
Moreover, it is obvious from the definition that 


/s < J o whenever f < g, and et= c | f forall c e [0, co). 
The next step is to establish one of the fundamental convergence theorems. 


2.14 The Monotone Convergence Theorem. /f { f,, } is asequence in L* such that 
fj < fj41 for all j, and f = limn oo fn (= sup, fn), then f T = liiis f Tras 


Proof. { f fn} is an increasing sequence of numbers, so its limit exists (possibly 
equal to co). Moreover, f fn < f f forall n, so lim f fa < f f. To establish the 
reverse inequality, fix a € (0,1), let ¢ be a simple function with 0 < ¢ < f, and let 
En = {x: fn(x) > ad(x)}. Then {En} is an increasing sequence of measurable 
sets whose union is X, and we have f fa > f gz, dn = a f m, ? By Proposition 
2.13d and Theorem 1.8c, lim Íe, p = { ¢, and hence lim f fn > a f ¢. Since this 
is true for all a < 1, it remains true for a = 1, and taking the supremum over all 
simple < f, we obtain lim f f, > f f. E 
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The monotone convergence theorem is an essential tool in many situations, but 
its immediate significance for us is as follows. The definition of f f involves the 
supremum over a huge (usually uncountable) family of simple functions, so it may 
be difficult to evaluate f f directly from the definition. The monotone convergence 
theorem, however, assures us that to compute f f it is enough to compute lim f On 
where {¢,,} is any sequence of simple functions that increase to f, and Theorem 
2.10 guarantees that such sequences exist. As a first application, we establish the 
additivity of the integral. 


2.15 Theorem. /f { f,,} is a finite or infinite sequence in L* and f = }`, fn, then 


Proof. First consider two functions fı and f2. By Theorem 2.10 we can find 
sequences {¢,; } and {w, } of nonnegative simple functions that increase to fı and fo. 
Then {¢,; + Wj} increases to f; + f2, so by the monotone convergence theorem and 
Theorem 2.13b, 


[tit ty) =i f (6; +95) =m foim fos = fit | he 


Hence, by induction, f y fa = y f fn for any finite N. Letting N — oo 
and applying the monotone convergence theorem again, we obtain f To. fa = 


D es a 


2.16 Proposition. /f f € L*, then | f =O iff f = 0 a.e. 


Proof. This is obvious if f is simple: if f = `] ajxz, with aj > 0, then 
J f = 0 iff for each j either a; = 0 or p(E;) = 0. In general, if f = 0 a.e. and ¢ 
is simple with O < ọ < f, then ¢ = 0 a.e., hence Tf = SUPg<f fo = 0. On the 
other hand, {x : f(x) > 0} = UY En where En = {x : f(x) > n7t}, so if it is 
false that f = 0 a.e., we must have u( En) > 0 for some n. But then f > nTİXE,, 
sof f > n™tu(En) > 0. E 


2.17 Corollary. If {f,} C Lt, f € L*, and fn(x) increases to f(x) for a.e. &, 
then | f = lim f fh. 


Proof. If f,,(x) increases to f(x) forx € E where (E°) = 0, then f- fxe = 0 
a.e. and fn — fhXE = 0 a.e., so by the monotone convergence theorem, f f = 


J fxe = lim f faXE = lim f fa. E 


The hypothesis that the sequence {fn} be increasing, at least a.e., is essential 
for the monotone convergence theorem. For example, if X is R and yp is Lebesgue 
measure, we have X(n.n41) — 0 and nx(0,1/n) — 0 pointwise, but f X(n,n+1) = 
f nX(0,1/n) = 1 for all n. As one sees by sketching the graphs, the trouble in these 
examples is that the area under the graph “escapes to infinity” as n — œo, so the 
area in the limit is less than one would expect. This is typical of the cases when the 
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integral of the limit is not the limit of the integrals, but in this situation there is still 
an inequality that remains valid. We deduce it from the following general result. 


2.18 Fatou’s Lemma. /f {f,,} is any sequence in L*, then 


J (lim inf fn) < lim inf J frn 


Proof. For each k > 1 we have infn>x fn < f; for j > k, hence f infn>k fn < 


J f; for j > k, hence finfn>k fn < infjsx f f;. Now let k — oo and apply the 
monotone convergence theorem: 


l (liminf f,) = lim J (inf fn) < lim inf / A 
E 


2.19 Corollary. If {fn} C Lt, f € L*, and f, > f a.e., then | f <liminf f fh. 


Proof. If fa —> f everywhere, the result is immediate from Fatou’s lemma, 
and this can be achieved by modifying f,, and f on a null set without affecting the 
integrals, by Proposition 2.16. E 


2.20 Proposition. Įf f € Lt and f f < œ, then {x : f(x) = co} is a null set and 
{x : f(x) > 0} is o-finite. 


The proof is left to the reader (Exercise 12). 


Exercises 
12. Prove Proposition 2.20. (See Proposition 0.20, where a special case is proved.) 


13. Suppose {fn} C Lt, fn — f pointwise, and f f = lim f fah < oo. Then 
Jef = lim fp fn forall E € M. However, this need not be true if f f = lim f fn = 
OO. 


14. If f € L*, let A(E) = fp f du for E € M. Then À is a measure on M, and for 
any g € Lt, f gd\ = f fg dw. (First suppose that g is simple.) 


15. If {fn} C L*, fn decreases pointwise to f, and f fı < oo, then f f = lim f fh. 


16. If f € L* and f f < œ, for every € > O there exists E € M such that 
(E) < coand J, f > (J f)—-e. 


17. Assume Fatou’s lemma and deduce the monotone convergence theorem from it. 
2.3 INTEGRATION OF COMPLEX FUNCTIONS 


We continue to work on a fixed measure space (X, M, u). The integral defined in the 
previous section can be extended to real-valued measurable functions f in an obvious 
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way; namely, if f* and f~ are the positive and negative parts of f and at least one 
of f ft and f f7 is finite, we define 


fi- jr-fr 


We shall be mainly concerned with the case where f ft and f fT are both finite; we 
then say that f is integrable. Since |f| = f* + f7, itis clear that f is integrable iff 


E < 00. 


2.21 Proposition. The set of integrable real-valued functions on X is a real vector 
space, and the integral is a linear functional on it. 


Proof. The first assertion follows from the fact that |af + bg] < lall f| + lbllgl, 
and it is easy to check that f af = a f for any a € R. To show additivity, suppose 
that f and g are integrable and let h = f +g. Then ht? —h~ = ft —f-—+gt—-g-, 
so ht? + fT +97 =h + ft + 9. By Theorem 2.15, 


[urs fare form [ars fare [or 


and regrouping then yields the desired result: 
ed aller lad ket as (le! lied La? | 
| 


Next, if f is acomplex-valued measurable function, we say that f is integrable if 
J |f| < œ. More generally, if E € M, f is integrable on E if fp |f| < oo. Since 
IF| < | Re f| + |Im f| < 2|f|, f is integrable iff Re f and Im f are both integrable, 


and in this case we define 
[t= fRer+i fims 


It follows easily that the space of complex-valued integrable functions is a complex 
vector space and that the integral 1s a complex-linear functional on it. We denote this 
space — provisionally — by L* (u) (or L! (X, u), or L’ (X), or simply L+, depending 
on the context). The superscript 1 is standard notation, but it will not assume any 
significance for us until Chapter 6. 

2.22 Proposition. If f € Lt, then| f f| < f If]. 


Proof. This is trivial if f f = 0 and almost trivial if f is real, since 


fal-|[r-fr]s [es a 


If f is complex-valued and f f 4 0, leta = sgn(f f). Then| f f|=af f= faf. 
In particular, i af is real, so 


[#]=Re f at= f Rean < [Reap < | lafi= Ist 
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2.23 Proposition. 
a. If f € L}, then {x : f(x) 0} is o-finite. 
b. If f,g €L}, then fpf = fgg forall E E M iff f |f — 9| =0 if f = g a.e. 


Proof. (a) and the second equivalence in (b) follow from Propositions 2.20 and 
2.16. If f |f — g| = 0, then by Proposition 2.22, for any E € M, 


ft- [als [xt-as [ir-si=o 


so that fp f = fpg. On the other hand, if u = Re(f — g), v = Im(f — g), and it 
is false that f = g a.e., then at least one of ut, u~, vt, and v™ must be nonzero on 
a set of positive measure. If, say, E&E = {x : ut(x) > 0} has positive measure, then 
Re( fe f — Jeg) = fput > 0 since u- = 0 on E; likewise in the other cases. g 





This proposition shows that for the purposes of integration it makes no difference 
if we alter functions on null sets. Indeed, one can integrate functions f that are only 
defined on a measurable set & whose complement is null simply by defining f to be 
zero (or anything else) on Æ®. In this fashion we can treat R-valued functions that 
are finite a.e. as real-valued functions for the purposes of integration. 

With this in mind, we shall find it more convenient to redefine L! (pu) to be the 
set of equivalence classes of a.e.-defined integrable functions on X, where f and g 
are considered equivalent iff f = g a.e. This new L}(,1) is still a complex vector 
space (under pointwise a.e. addition and scalar multiplication). Although we shall 
henceforth view L(y) as a space of equivalence classes, we shall still employ the 
notation “f € L!(u)” to mean that f is an a.e.-defined integrable function. This 
minor abuse of notation is commonly accepted and rarely causes any confusion. 

The new definition of L? (u) has two further advantages. First, if Z is the comple- 
tion of u, Proposition 2.12 yields a natural one-to-one correspondence between L! (Fi) 
and L1+(j,1), so we can (and shall) identify these spaces. Second, L} is a metric space 
with distance function p(f, g) = f |f —g|. (The triangle inequality is easily verified, 
and obviously p(f,g) = p(g, f), but to obtain the condition that p(f,g) = 0 only 
when f = g, one must identify functions that are equal a.e., according to Proposition 
2.23b.) We shall refer to convergence with respect to this metric as convergence in 
Lt; thus fa —> f in L! iff f | fa — f| — 0. 

We now present the last of the three basic convergence theorems (the other two 
being the monotone convergence theorem and Fatou’s lemma) and derive some useful 
consequences from it. In the context of integration on R with Lebesgue measure as 
in the discussion preceding Fatou’s lemma, the idea behind this theorem is that if 
fn — f ae. and the graph of |f,,| is confined to a region of the plane with finite area 
so that the area beneath it cannot escape to infinity, then f f, > f f. 


2.24 The Dominated Convergence Theorem. Let { f,,} be a sequence in L} such 
that (a) fn — f a.e., and (b) there exists a nonnegative g € L such that | Talg 
a.e. for all n. Then f € L! and f f = liMmn>o f fn- 
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Proof. f is measurable (perhaps after redefinition on a null set) by Propositions 
2.11 and 2.12, and since | f| < g a.e., we have f € Lt. By taking real and imaginary 
parts it suffices to assume that fn and f are real-valued, in which case we have 
g+ fn 2 O0a.e. and g — fn 2 0a.e. Thus by Fatou’s lemma, 


Jos [fs timint [(g+ ty) = | o-+limint | fn 
Jo- ff stimine [(- f,) = | o- limsup | fr. 


Therefore, liminf f fa > f f > limsup f fn, and the result follows. E 


2.25 Theorem. Suppose that { f;} is a sequence in L* such that X> f |f;| < œ. 
Then Y? f; converges a.e. to a function in L', and [X3 f; = SP Jf; 


Proof. By Theorem 2.15, f X3 |f;| = oP SJ \F;| < œ, so the function g = 
yo) |f;| is in Lt. In particular, by Proposition 2.20, $O? |f;(z)| is finite for a.e. 
x, and for each such z the series $7 f(x) converges. Moreover, | X7 f;| < g for 
all n, so we can apply the dominated convergence theorem to the sequence of partial 


sums to obtain [YF f; = SSP S fi. E 


2.26 Theorem. If f € L!(u) and e > 0, there is an integrable simple function 
$ = $ a;jxe; such that f |f — ¢| du < €. (That is, the integrable simple functions 
are dense in L in the L} metric.) If u is a Lebesgue-Stieltjes measure on R, the sets 
E; in the definition of ọ can be taken to be finite unions of open intervals; moreover, 
there is a continuous function g that vanishes outside a bounded interval such that 


S\f-gldu<e. 


Proof. Let {@n} be as in Theorem 2.10b; then f |¢n — f| < € for n sufficiently 
large by the dominated convergence theorem, since |¢, — f| < 2|f|. If gn = 
>, 43X Bz, Where the E; are disjoint and the a; are nonzero, we observe that (Ej) = 
la;|~+ fe, |@nl < la;|~ f |f| < oo. Moreover, if E and F are measurable sets, 


we have (EAF) = f |xe — xrl. Thus if p is a Lebesgue-Stieltjes measure on R, 
by Proposition 1.20 we can approximate xg, arbitrarily closely in the L! metric by 
finite sums of functions xz, where the J;,’s are open intervals. Finally, if Ip = (a,b) 
we can approximate x7, in the L! metric by continuous functions that vanish outside 
(a,b). (For example, given € > 0, take g to be the continuous function that equals 
0 on (—0o, a] and fb, oo), equals 1 on [a + €, b — €], and is linear on fa, a + €] and 
[b — €, b].) Putting these facts together, we obtain the desired assertions. E 


The next theorem gives a criterion, less restrictive than those found in most 
advanced calculus books, for the validity of interchanging a limit or a derivative with 
an integral. 
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2.27 Theorem. Suppose that f : X x [a,b] — C for <a<b < œ) and that 
f(t): X — C is integrable for each t € [a,b]. Let F(t) = Jy f(x,t) du(2). 

a. Suppose that there exists g € L! (u) such that me < g(x) for all x,t. 
If limi, f(z,t) = f(z,to) for every x, then limy_.,, F(t) = F(to); in 
particular, if f(x, -) is continuous for each x, then F is continuous. 

b. Suppose that Of /Ot exists and there isag E€ L! (u) such that |(Of /Ot)(zx, t)| < 
g(x) for all x,t. Then F is differentiable and F'(x) = f (Of /Ot)(z, t) du(z). 


Proof. For (a), apply the dominated convergence theorem to f,(z) = f(z,tn) 
where {tn } is any sequence in [a, b] converging to tg. For (b), observe that 


Of f(z, tn) zs f(z, to) 


5, (5 to) = lim hn (x) where hn (£) = tto ? 


{tn } again being any sequence converging to to. It follows that ô f /Ot is measurable, 
and by the mean value theorem, 


Jhn(x)| < sup 
tE [a,b] 





(at) < gla) 


so the dominated convergence theorem can be invoked again to give 


F'(to) = lim ae = lim f ha (a) du(x) = [feo du(x). 


The device of using sequences converging to to in the preceding proof is technically 
necessary because the dominated convergence theorem deals only with sequences of 
functions. However, in such situations we shall usually just say “let t — fo” with the 
understanding that sequential convergence is underlying the argument. 

It is important to note that in Theorem 2.27 the interval [a,b] on which the 
estimates on f or Of /Ot hold might be a proper subinterval of an open interval I 
(perhaps R itself) on which f(z,-) is defined. If the hypotheses of (a) or (b) hold 
for all [a,b] C I, perhaps with the dominating function g depending on a and b, one 
obtains the continuity or differentiability of the integrated function F on all of J, as 
these properties are local in nature. 

In the special case where the measure u is Lebesgue measure on R, the integral 
we have developed is called the Lebesgue integral. At this point it 1s appropriate 
to study the relation between the Lebesgue and Riemann integrals on R. We shall 
use Darboux’s characterization of the Riemann integral in terms of upper and lower 
sums, which we now recall. 

Let [a,b] be a compact interval. By a partition of [a,b] we shall mean a finite 
sequence P = {t;} such that a = tp < tı < -+> < tn = b. Let f be an arbitrary 
bounded real-valued function on [a, b]. For each partition P we define 


Spf = Mit ty-1), spf =} m(t; — tj-1) 
1 
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where M; and m; are the supremum and infimum of f on [t;_1,t¢;]. Then we define 
—b f 
I.(f)=inf Spf, LC) SeS 


where the infimum and supremum are taken over all partitions P. If T (f =f), 


their common value is the Riemann integral f f(x) dz, and f is called Riemann 
integrable. 


2.28 Theorem. Let f be a bounded real-valued function on |a, b). 
a. If f is Riemann integrable, then f is Lebesgue measurable (and hence inte- 
grable on |a, b) since it is bounded), and qe f(xz)dz = Sia.) f dm. 
b. f is Riemann integrable iff {x € [a,b] : f is discontinuous at x} has Lebesgue 
measure Zero. 


Proof. Suppose that f is Riemann integrable. For each partition P let 


Gp = SOMiX(j-a03))  9P = DO X-a 
1 


1 


(with the same notation as above), so that Spf = [Gpdm and spf = f gp dm. 
There is a sequence {Px} of partitions whose mesh (i.e., max; (t; — t;-1)) tends to 
zero, each of which includes the preceding one (so that gp, increases with k while 
G p, decreases), such that Sp, f and sp, f converge to i f(x) dx. Let G = lim Gp, 
and g = lim gp,. Then g < f < G, and by the dominated convergence theorem, 
[Gdm = fgdm = f? f(a) dz. Hence [(G — g)dm = 0, so G = g a.e. by 
Proposition 2.16, and thus G = f a.e. Since G is measurable (being the limit of a 
sequence of simple functions) and m is complete, f is measurable and Jia, b| fdm= 


f Gdm = ie f(x) dz. This proves (a), and the proof of (b) is outlined in Exercise 
23; a 


The (proper) Riemann integral is thus subsumed in the Lebesgue integral. Some 
improper Riemann integrals (the absolutely convergent ones) can be interpreted 
directly as Lebesgue integrals, but others still require a limiting procedure. For 
example, if f is Riemann integrable on 0, b] for all b > O and Lebesgue integrable 


on [0, 00), then too, sa f dm = limpoo ea x) dx (by the dominated convergence 
ra but t hemen the right can exist even fete f is not integrable. (Example: 
= Vr 2 (-1)" Xin, n+1}-) Henceforth we shall generally use the notation 


f(a) dx for Lebesgue integrals. 

A ~ remarks comparing the construction of the Lebesgue and Riemann integrals 
may be helpful. Let f be a bounded measurable function on [a, b], and for simplicity 
let us assume that f > 0. To compute the Riemann integral of f, one partitions 
the interval [a,b] into subintervals and approximates f from above and below by 
functions that are constant on each subinterval. To compute the Lebesgue integral of 
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f, one picks a sequence of simple functions that increase to f. In particular, if one 
picks the sequence constructed in the proof of Theorem 2.10a (see Figure 2.1), one 
is in effect partitioning the range of f into subintervals J; and approximating f by a 
constant on each of the sets f—‘(I;). This procedure requires a more sophisticated 
theory of measure to begin with since the sets f~+(J;) can be complicated, even when 
f is continuous; but it is better adapted to the particular f under consideration and 
therefore more flexible — and more susceptible to generalization. (In the Lebesgue 
theory, the assumption that f is measurable removes the necessity of considering 
both upper and lower approximations; however, the latter point of view can also be 
made to work in the abstract setting. See Exercise 24.) 

The Lebesgue theory offers two real advantages over the Riemann theory. First, 
much more powerful convergence theorems, such as the monotone and dominated 
convergence theorems, are available. These not only yield results previously unob- 
tainable but also reduce the labor in proving classical theorems. Second, a wider 
class of functions can be integrated. For example, if R is the set of rational numbers 
in [0,1], xz is not Riemann integrable, being everywhere discontinuous on (0, 1], 
but it is Lebesgue integrable, and f xr dm = 0. (Actually, this is in some sense a 
trivial example since xp agrees a.e. with the constant function 0. For a more inter- 
esting example, see Exercise 25.) Of course, virtually all functions that one meets in 
classical analysis are (locally) Riemann integrable, so this added generality is rarely 
used in computing specific integrals. However, it has the crucial effect that various 
metric spaces of functions whose metrics are defined in terms of integrals are com- 
plete when Lebesgue integrable functions are used but not when one considers only 
Riemann integrable functions. We shall investigate this situation more thoroughly 
later, especially in Chapter 6. (We have already proved the completeness of L+ (pu), 
disguised as Theorem 2.25. To remove the disguise, see Theorem 5.1.) 

We conclude this section by introducing the most ubiquitous of the higher tran- 
scendental functions, the gamma function I’, which will play a role in a number of 
places later on. If z € Cand Rez > 0, define f, : (0,00) — C by f,(t) = t?~“1e7*. 
(Here t77! = exp[(z—1) logt].) Since |t7—!| = tP® #71, we have | f,(t)| < ¢®e771, 
and also | f,(t)| < C,e—*/? for t > 1. (The precise value of C, can easily be found 
by maximizing t®¢*—1e-*/?, but it is of no importance here.) Since D t° dt < œ 
fora > —1 and f,” e~*/? dt < oo, we see that f € L'((0,00)) for Rez > 0, and 
we define 


T(z) = te dt (Rez > 0). 
0 


N Ñ N 
/ te™ dt = =e |, +z j tle™ dt 
€ € 


by integration by parts, by letting € — 0 and N — co we see that for Rez > 0, T 
satisfies the functional equation 


Since 


e+ 121 (2). 


This equation can then be used to extend I’ to (almost) the entire complex plane. 
Namely, for —1 < Re z < 0 we can define T (z) to be T (z + 1) /z, and by induction, 
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having defined T(z) for Rez > —n, we define T(z) for Rez > —n — 1 to be 
I'(z+1)/z. The result is a function defined on all of C except for singularities at the 
nonpositive integers where the algorithm just described involves division by zero. 

We have r(1) = fy e™tdt = —e™*| = 1, so an n-fold application of the 
functional equation shows that T(n + 1) = n!. (Another proof of this fact is outlined 
in Exercise 29.) Many of the applications of the gamma function involve the fact that 
it provides an extension of the factorial function to nonintegers. 


Exercises 


18. Fatou’s lemma remains valid if the hypothesis that f,, € LY is replaced by the 
hypothesis that f„ is measurable and f, > —g where g € Lt N L!. What is the 
analogue of Fatou’s lemma for nonpositive functions? 


19. Suppose {fn} C L! (u) and fn — f uniformly. 
a. If p(X) < œ, then f € L(y) and f fa > J f- 
b. If (X) = œ, the conclusions of (a) can fail. (Find examples on R with 
Lebesgue measure.) 


20. (A generalized Dominated Convergence Theorem) If fn, gn, f,g € Sore oe 
and gn > g a.e., |fn| < gn, and f gn — f g, then f fn — f f. (Rework the proof 
of the dominated convergence theorem.) 


21. Suppose fn, f € L' and f, > f ae. Then f|fa — f| > Oiff f |fa] > SIFI. 
(Use Exercise 20.) 


22. Let u be counting measure on N. Interpret Fatou’s lemma and the monotone and 
dominated convergence theorems as statements about infinite series. 


23. Given a bounded function f : [a,b] — R, let 
H(x) = lim sup f(y), h(x) = lim inf f(y). 


6-0 jy—r|<ő ô—0 |y—x|<ô 
Prove Theorem 2.28b by establishing the following lemmas: 
a. H(x) = h(x) iff f is continuous at z. 
b. In the notation of the proof of Theorem 2.28a, H = G a.e. and h = g 
a.e. Hence H and h are Lebesgue measurable, and Sia b] H dm = T f) and 


fanh dm = Lf). 


24. Let (X,M, u) be a measure space with p(X) < oo, and let (X, M, i) be its 
completion. Suppose f : X — R is bounded. Then f is M-measurable (and 
hence in L1(7z)) iff there exist sequences {ġn } and {Wn } of M-measurable simple 
functions such that dn < f < Yn and f(Yn — on) du < n~t. In this case, 
lim f ọn du = lim f Yn du = f f dp. 

25. Let f(z) = x712 if 0 < x < 1, f(x) = 0 otherwise. Let {r,}§° be an 
enumeration of the rationals, and set g(x) = °° 27” f(x — rn). 

a. g € L'(m), and in particular g < œ a.e. 





60 INTEGRATION 


b. g is discontinuous at every point and unbounded on every interval, and it 
remains so after any modification on a Lebesgue null set. 


c. g? < œ a.e., but 7 is not integrable on any interval. 


26. If f € L!(m) and F(x) = f” Ţ f(t) dt, then F is continuous on R. 


27. Let fa(x) = ae~"@* — be” where 0 < a < b. 
a 1 So |fn(x)| dx = œœ 
bso, lg fee) dra 0. 
Ci oe i 00), m), and [> 579° fn (x) dz = log(b/a). 
28. Compute the following limits and justify the calculations: 
a. limpoo fo (1+ (x/n))~" sin(z/n) dz. 
b. limyn—soo fa + nz?)(1 + 2*)—" de. 
Chita ags fe tg [x(1 + x7)]—! dz. 
d. lim,,_.06 hee n(1 + n?x*)—! dx. (The answer depends on whether a > 0, 
a=0,ora <0. o n this accord with the various convergence theorems?) 


29. Show that [J 2"e-* dx = n! by differentiating the equation [J e~** dr = 
1/t. Similarly, show that ae rre- dr = (2n)!,/7/4"n! by differentiating the 
equation es et?’ dr = ,/ 7 /t (see Proposition 2.53). 


30. Show that limk— oo ie r”(1-— Kety) dx = nl. 


31. Derive the following formulas by expanding part of the integrand into an infinite 
series and justifying the term-by-term integration. Exercise 29 may be useful. (Note: 
In (d) and (e), term-by-term integration works, and the resulting series converges, 
only for a > 1, but the formulas as stated are actually valid for all a > 0.) 


a. Fora > 0, des e77" cosaz dz = /re~* /4. 


b. Fora > —1, fo 2° (1-— zx)! logrdr = 9° (a + k)? 
c Fora > 1, fy x° —1)-!dr =T(a)¢(a oiei = ie 
d. Fora > 1, f e~¢* 2! sin z dx = arctan(a*). 
e. Fora > 1, [J e7 Jo(x )dz = = (s? + 1)~!/2, where 
Jo(z) = $0 (—1)” x?” /4”(n!)? is the Bessel function of order zero. 


2.4 MODES OF CONVERGENCE 


If { fn } is a sequence of complex-valued functions on a set X, the statement “fn > f 
as n — œ” can be taken in many different senses, for example, pointwise or uniform 
convergence. If X is a measure space, one can also speak of a.e. convergence or 
convergence in Lt. Of course, uniform convergence implies pointwise convergence, 
which in turn implies a.e. convergence (and not conversely, in general), but these 
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modes of convergence do not imply L! convergence or vice versa. It will be useful 
to keep in mind the following examples on R (with Lebesgue measure): 


Las n-"x(0,n): 
ii. fn = X(njn41): 
iii. fn = NX0,1/n)- 


iv. fi = X[0,1]> fe = X[0,1/2]> fg = X[1/2,1]> fa = X[0,1/4]> fs = X{1/4,1/2]> 
fe = X{1/2,3/4), f7 = X[3/4,1), and in general, fn = Xj 72%, (7+1)/2*] Where 
n = 2* + j withO < j < 2*. 


In (i), (ii), and (iii), fn — O uniformly, pointwise, and a.e., repectively, but 
fn Æ 0 € L (in fact J \fn| = f fn = 1 for all n). In (iv), fa — 0 in Lt since 
f \fn| = 27-* for 2* < n < 2*+1, but f,,(x) does not converge for any x € (0, 1] 
since there are infinitely many n for which f,,(x) = 0 and infinitely many for which 
Jace) 41: 

On the other hand, if fa > f a.e.and|f,| < g € L! forall n, then fn — f in L}. 
(This is clear from the dominated convergence theorem since | fn — f| < 2g.) Also, 
we shall see below that if fa — f in L! then some subsequence converges to f a.e. 

Another mode of convergence that is frequently useful is convergence in measure. 
We say that a sequence { fn } of measurable complex-valued functions on (X, M, u) 
is Cauchy in measure if for every € > 0, 


u({z : |fn(2) — fm(2)| 2 €}) > 0 as m,n > o9, 
and that { fn } converges in measure to f if for every € > 0, 


u({x:|fr(x) — f(2)| > €}) > 0 as n => o. 


For example, the sequences (1), (111), and (iv) above converge to zero in measure, but 
(ii) is not Cauchy in measure. 


2.29 Proposition. If fa — f in L}, then fn — f in measure. 


Proof. Let Ene = {2 : |fn(x)—f(2)| 2 €}. Then f |fn—Fl > fp, lfa- fl = 
eM Ene), 80 (Ene) < E€ J |fn — f| > 0. E 
The converse of Proposition 2.29 is false, as examples (i) and (iii) show. 





2.30 Theorem. Suppose that { fn } is Cauchy in measure. Then there is a measurable 
function f such that fn — f in measure, and there is a subsequence {fn,} that 
converges to f a.e. Moreover, if also fa — g in measure, then g = f a.e. 


Proof. We can choose a subsequence {g9;} = {fn; } of {fn} such that if E; = 
{x : |g;(z) = 95+1(2)| > 24}, then (Ej) < 271. If Fk = U eE; then 
(Fk) < XK 2-5 = 2'-*, and if x ¢ Fp, fori > j > k we have 


i—l1 i—l1 
(2.31) — |gj(w) — gi(@)| < XC lg (x) — g(z)| < X27 < 2. 
l=j l=j 
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Thus {g;} is pointwise Cauchy on Ff. Let F = (F Fe = limsup£;. Then 
(F) = 0, and if we set f(x) = limg,(x) for x ¢ F and f(x) = 0 fora E€ F, 
then f is measurable (see Exercises 3 and 5) and g; — f a.e. Also, (2.31) shows 
that |g;(x) — f(x)| < 2+7 for x ¢ Fk and j > k. Since u(Fk) —> 0 as k — oo, it 
follows that g; — f in measure. But then fn — f in measure, because 


{a : |fn(2)—F(e)| > €} c {2 : |fa(x)—9;(2)| > be}Ue : lo;le)-F(E)] > te}, 


and the sets on the right both have small measure when n and 7 are large. Likewise, 
if fn — g in measure, 


{a : |f(x) —9(2)| > €} € {x : |x) — fala)| > de} Ufa : fale) —g(2)| > 4e} 


for all n, hence p({z : |f(x) — g(x)| > e}) = 0 for all e. Letting € tend to zero 
through some sequence of values, we conclude that f = g a.e. E 


2.32 Corollary. If fa — f in L’, there is a subsequence { Jasi such that fn; > f 
a.e. 


Proof. Combine Proposition 2.29 and Theorem 2.30. E 


If fn — f ae., it does not follow that fn — f in measure, as example (ii) shows. 
However, this conclusion does hold on a finite measure space, where something 
considerably stronger is true. 


2.33 Egoroff’s Theorem. Suppose that u( X) < œ, and fı, fo,... and f are mea- 
surable complex-valued functions on X such that fn — f a.e. Then for every € > 0 
there exists E C X such that u(E) < € and fn — f uniformly on E°. 


Proof. Without loss of generality we may assume that f, — f everywhere on 
X. Fork,n € Nlet 


e aa — f(x)| > 71}. 


Then, for fixed k, E,,(k) decreases as n increases, and ram E(k) = Ø, so since 
u( X) < oo we conclude that u(En(k)) — Oas n —> oo. Givene > Oandk EN, 
choose nę so large that u(En,(k)) < €27* and let E = UP, En, (k). Then 
(E) < €, and we have |f,(x) — f(z)| < k7' forn > np and z ¢ E. Thus fn > f 
uniformly on E°. E 


The type of convergence involved in the conclusion of Egoroff’s theorem is some- 
times called almost uniform convergence. Itis not hard to see that almost uniform 
convergence implies a.e. convergence and convergence in measure (Exercise 39). 
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Exercises 


32. Suppose (X) < oo. If f and g are complex-valued measurable functions on 


X, define 
lf —g 
= PER | g 
p(f,9) TET H 


Then p is a metric on the space of measurable functions if we identify functions that 
are equal a.e., and fn — f with respect to this metric iff fn — f in measure. 


33. If fn > 0 and fn — f in measure, then f f < liminf f fn. 


34. Suppose |fn| < g € L! and fn — f in measure. 


a. | f= limf fn. 
b. fa > f in L. 


35. fn — f in measure iff for every e > 0 there exists N € N such that p({z : 
lfn(z) — f(z)| > €}) < eforn > N. 


36. If u(En) < oo for n € Nand yg, — f in L}, then f is (a.e. equal to) the 
characteristic function of a measurable set. 


37. Suppose that fn» and f are measurable complex-valued functions and ġ : C —> C. 
a. If pis continuous and f,, —> f a.e., then do fn —> dof ae. 
b. If ¢ is uniformly continuous and fn — f uniformly, almost uniformly, or 
in measure, then do fn — go f uniformly, almost uniformly, or in measure, 
respectively. 
c. There are counterexamples when the continuity assumptions on @ are not 
satisfied. 


38. Suppose fn — f in measure and gn — g in measure. 
a. fn + gn — f + g in measure. 
b. fngn — fg in measure if u(X ) < œ, but not necessarily if u( X) = oo. 


39. If f, — f almost uniformly, then f, — f a.e. and in measure. 


40. In Egoroff’s theorem, the hypothesis “u( X) < oo” can be replaced by “| fa| < g 
for all n, where g € L! (u)? 


41. If u is o-finite and fa — f a.e., there exist measurable E1, E2,... C X such 
that u( (UJ £;)°) = 0 and fn — f uniformly on each Ej. 


42. Let u be counting measure on N. Then f,, — f in measure iff f,, — f uniformly. 


43. Suppose that p(X) < œ and f : X x [0,1] — C is a function such that f(-, y) 
is measurable for each y € [0, 1] and f(z, -) is continuous for each z € X. 
a. If0 < €, < 1 then Ess = {x : |f(z,y) — f(x,0)| < eforall y < 6} is 
measurable. 
b. For any € > 0 there is a set E C X such that p(E) < cand f(-,y) > f(-,0) 
uniformly on E* as y — 0. 
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44. (Lusin’s Theorem) If f : [a,b] + Cis Lebesgue measurable and e€ > 0, there is 
a compact set E C [a, b] such that p(E°) < eand f|E is continuous. (Use Egoroff’s 
theorem and Theorem 2.26.) 


2.5 PRODUCT MEASURES 


Let (X,M, u) and (Y,N,v) be measure spaces. We have already discussed the 
product o-algebra M @ N on X x Y; we now construct a measure on M & N that is, 
in an obvious sense, the product of p and v. 

To begin with, we define a (measurable) rectangle to be a set of the form A x B 
where A € M and B € N. Clearly 


(Ax B)N(Ex F) =(ANE)x(BNF), (AxB) = (X x B°JU(A° x B). 


Therefore, by Proposition 1.7, the collection A of finite disjoint unions of rectangles 
is an algebra, and of course the o-algebra it generates is M @ N. 

Suppose A x B is a rectangle that is a (finite or countable) disjoint union of 
rectangles A; x B;. Then for x € X andy E Y, 


xa(2)xB(y) = xaxB(2, Y) = >> x4;xB, (2,9) = > x4 (2 


If we integrate with respect to x and use Theorem 2.15, we obtain 


M(A)xB(y) = J xa(2)xB(y) da(s) = X / XA; (z)xB; (y) du(z) 
= X MA;)xB,; (Y 


In the same way, integration in y then yields 


= X H(A;)v(B;) 


It follows that if & € A is the disjoint union of rectangles A; x By,,..., An X Bn, 


and we set n 
= ` p(A;)v(E5) 
1 


(with the usual convention that 0 - co = 0), then z is well defined on A (since any 
two representations of E as a finite disjoint union of rectangles have a common 
refinement), and 7 is a premeasure on A. According to Theorem 1.14, therefore, 7 
generates an outer measure on X x Y whose restriction to M x N is a measure that 
extends 77. We call this measure the product of u and v and denote it by p x v. 
Moreover, if u and v are o-finite— say, X = UY A; andY = U? By with p(A;) < 
oo and v( By) < œ0 — then X x Y =U, , Aj x Br, and u x v(A; x By) < œ, so 
u x v is also o-finite. In this case, by Theorem 1.14, u x v is the unique measure on 
M & N such that p x v(A x B) = p(A)v(B) for all rectangles A x B. 
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The same construction works for any finite number of factors. That is, suppose 
(Xj, Mj, uj) are measure spaces for 7 = 1,...,n. If we define a rectangle to be 
a set of the form A; x --- x A, with A; € Mj, then the collection A of finite 
disjoint unions of rectangles is an algebra, and the same procedure as above produces 
a measure u1 X +- X Un on Mi ®--- @ Mhn such that 


n 


Ea E An) = TT o5(As) 


1 


Moreover, if the p;’s are o-finite so that the extension from A to @) M; is uniquely 
determined, the obvious associativity properties hold. For example, if we identify 
X, x Xə x X3 with (X1 x X2) x X3, we have Mı @M2@M3 = (Mı @M2) Q M3 
(the former being generated by sets of the form A; x A2 x A3 with A; € M;, and 
the latter by sets of the form B x A3 with B € Mı Q M2 and A3 € M3), and 
Hı X H2 X u3 = (Hı X u2) X pg (since they agree on sets of the form A; x A2 x A3, 
and hence in general by uniqueness). Details are left to the reader (Exercise 45). All 
of our results below have obvious extensions to products with n factors, but we shall 
stick to the case n = 2 for simplicity. 

We return to the case of two measure spaces (X, M, u) and (Y, N, v). FE C 
X x Y,forxz € X and y € Y we define the x-section E, and the y-section E” of 
E by 


Es = {y EY : (x,y) € E}, BY = {xe X: (x,y) € E}. 


Also, if f is a function on X x Y we define the x-section f, and the y-section f¥ 
of f by 
fa(y) = f” (z) = f(z,y). 


Thus, for example, (vz)z = XE, and (xg)! = XEv. 


2.34 Proposition. 
a fE E€MxN, then E, € Nforall x € X and EY € M forally € Y. 
b. If f is M & N-measurable, then f, is N-measurable forall x € X and f” is 
M-measurable for ally € Y. 


Proof. LetR bethe collection of all subsets & of X xY such that E, € N forall xz 
and EY € M forall y. Then R obviously contains all rectangles (e.g., (A x B), = B 
if x € A, = Ø otherwise). Since (UZ E;)s = UT (Ej)z and (E°), = (Ez)°, and 
likewise for y-sections, R is a o-algebra. Therefore R D MON, which proves (a). (b) 
follows from (a) because (fz)~1(B) = (f1 (B))z and (f¥)~1(B) = (f7*(B))". a 


Before proceeding further we need a technical lemma. We define a monotone 
class on a space X to be a subset C of P(X) that is closed under countable increasing 
unions and countable decreasing intersections (that is, if E; € € and 2, C Eg C::., 
then J Æ, € C, and likewise for intersections). Clearly every -algebra is a monotone 
class. Also, the intersection of any family of monotone classes is a monotone class, 
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so for any € C P(X) there is a unique smallest monotone class containing €, called 
the monotone class generated by €. 


2.35 The Monotone Class Lemma. If A is an algebra of subsets of X, then the 
monotone class © generated by A coincides with the o-algebra M generated by A. 


Proof. Since M is a monotone class, we have € C M; and if we can show that 
C is a o-algebra, we will have M C C. To this end, for E € € let us define 


C(E)={F eC: E\F, F\E, and En Fare inC}. 


Clearly @ and F are in C(F&), and E € C(F) iff F € C(£). Also, it is easy to check 
that C( E) is a monotone class. If E € A, then F € C(£) for all F € A because A 
is an algebra; that is, A C C(E), and hence € C C(F). Therefore, if F € C, then 
F € C(E) for all E € A. But this means that E € C(F) for all E € A, so that 
A C C(F) and hence € C C(F’). Conclusion: If E, F € ©, then E \ F and EN F 
are in €. Since X € A C C, C is therefore an algebra. But then if {£;}?° C C, we 
have (J; E; € € for all n, and since C is closed under countable increasing unions it 
follows that Ur E; € ©. In short, € is a c-algebra, and we are done. E 


We now come to the main results of this section, which relate integrals on X x Y 
to integrals on X and Y. 


2.36 Theorem. Suppose (X,M, u) and (Y, N,v) are o-finite measure spaces. If 
E E MON, then the functions x œ v(E,) and y +> u(E”) are measurable on X 
and Y, respectively, and 


wx v(B) = / V(B,) du(2) = I u(E”) duly). 


Proof. First suppose that yz and v are finite, and let € be the set of all E € 
M & N for which the conclusions of the theorem are true. If E = A x B, then 
v(Er) = xa(x)v(B) and p(E¥) = u(A)xB(y), so clearly Æ € €. By additivity 
it follows that finite disjoint unions of rectangles are in C, so by Lemma 2.35 it 
will suffice to show that C€ is a monotone class. If {F,,} is an increasing sequence 
in € and E = LU) En, then the functions f,(y) = u((En)”) are measurable and 
increase pointwise to f(y) = u( EY”). Hence f is measurable, and by the monotone 
convergence theorem, 


f E” dvo) = tim f MEY) dvg) = lim p x vE) = p x oE). 


Likewise p x v(E) = f v(Ez)dp(z), so E € €. Similarly, if {En} is a decreas- 
ing sequence in € and F En, the function y + p((E1)”) is in L (v) because 
u((E1)”) < p(X) < œ and v(Y) < œ, so the dominated convergence theorem 
can be applied to show that Æ € €. Thus Č is a monotone class, and the proof is 
complete for the case of finite measure spaces. 
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Finally, if u and v are o-finite, we can write X x Y as the union of an increasing 
sequence {X; x Y;} of rectangles of finite measure. If E € M @N, the preceding 
argument applies to Æ N (X; x Y;) for each j to give 


uxv(ENn(X;xY;)) = [xs @)0(Bs9¥5) dul = | xxs)uBPNX;) v(a), 


and a final application of the monotone convergence theorem then yields the desired 
result. E 


2.37 The Fubini-Tonelli Theorem. Suppose that (X,M, p) and (Y, N, v) are o- 
finite measure spaces. 


a. (Tonelli) If f € L>(X x Y), then the functions g(x) = f fzdv and h(y) = 
J f? dp are in L*(X) and L? (Y), respectively, = 


[sauxny= [| [tenant] auto 
= f|] tenana) ai) 


b. (Fubini) If f € L! (u x v), then fz € ni ) forae. rE X, a € L! (u) for 
a.e. y E Y, the a.e.-defined functions g(x) = f fz dv and h(x) = f f? dv are 
in L! (u) and L! (v), respectively, and (2. ? ) holds. 


(2.38) 


Proof. Tonelli’s theorem reduces to Theorem 2.36 in case f is a characteristic 
function, and it therefore holds for nonnegative simple functions by linearity. If 
f € L*(X xY), let {fn} be a sequence of simple functions that increase pointwise 
to f as in Theorem 2.10. The monotone convergence theorem implies, first, that the 
corresponding gn and hn increase to g and h (so that g and h are measurable), and, 
second that 


| odu=im f ondu tim f fratuxv)= | faux), 
[rdv=tim [indy = tim | frau xv) = f faiu x v), 


which is (2.38). This establishes Tonelli’s theorem and also shows that if f € 
L+(X x Y) and f f d(u x v) < œ, then g < œ ae. and h < ow a.e., that is, 
fe € Li (v) for ae. x and f” € L(y) for a.e. y. If f € L! (u x v), then, the 
conclusion of Fubini’s theorem follows by applying these results to the positive and 
negative parts of the real and imaginary parts of f. E 


A few remarks are in order: 


e We shall usually omit the brackets in the iterated integrals in (2.38), thus: 


[| [ rendua (2)| wo) = | æy) dula) dv(y y= |] tanav. 
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e The hypothesis of o-finiteness is necessary; see Exercise 46. 


e The hypothesis f € L*(X x Y) or f e L'(u x v) is necessary, in two 
respects. First, it is possible for fẹ and fY to be measurable for all x, y and 
for the iterated integrals [f f djdv and ff f dv dp to exist even if f is not 
M Q N-measurable. However, the iterated integrals need not then be equal; see 
Exercise 47. Second, if f is not nonnegative, it is possible for fẹ and f” to be 
integrable for all z, y and for the iterated integrals [f f dpdv and ff f dv dp 
to exist even if f | f|d(y x v) = co. But again, the iterated integrals need not 
be equal; see Exercise 48. 


e The Fubini and Tonelli theorems are frequently used in tandem. Typically one 
wishes to reverse the order of integration in a double integral [f f du dv. First 
one verifies that f |f|d(u xv) < oo by using Tonelli’s theorem to evaluate this 
integral as an iterated integral; then one applies Fubini’s theorem to conclude 
that [f f dudv = ff f dvd. For examples, see the exercises in §2.6. 


Even if u and v are complete, x v is almost never complete. Indeed, suppose 
that there is a nonempty A € M with (A) = 0 and that N Æ P(Y). (This is the 
case with u = v = Lebesgue measure on R, for example.) If E € P(Y) \ N, then 
Ax E ¢€M®ON by Proposition 2.34, butAx E CAxY,anduwxv(AxY) =0. 

If one wishes to work with complete measures, of course, one can consider the 
completion of u x v. In this setting the relationship between the measurability 
of a function on X x Y and the measurability of its z-sections and y-sections is 
not so simple. However, the Fubini-Tonelli theorem is still valid when suitably 
reformulated: 


2.39 The Fubini-Tonelli Theorem for Complete Measures. Let (X, M, p) and 
(Y, N,v) be complete, o-finite measure spaces, and let (X x Y,£, A) be the com- 
pletion of (X x Y, M@N, wx v). If f is &-measurable and either (a) f > 0 
or (b) f € L(A), then f is N-measurable for a.e. x and f? is M-measurable for 
a.e. y, and in case (b) fz and f” are also integrable for a.e. x and y. Moreover, 
Lt f fz dv andy f f” du are measurable, and in case (b) also integrable, and 


Jitas J| Eey dula) ag j= || Hæna jdu: 


This theorem is a fairly easy corollary of Theorem 2.37; the proof is outlined in 
Exercise 49. 


Exercises 

45. If (Xj, M;) is a measurable space for j = 1, 2, 3, then Q? M; = (Mı M2) 8 
M3. Moreover, if uj is a o-finite measure on (X,;,M;), then pı X u2 x u3 = 
(u1 X 2) X ps. 


46. Let X = Y = [0,1], M = N = Bio, 1], 4 = Lebesgue measure, and v = counting 
measure. If D = {(x, x) : x € [0,1]} is the diagonal in X x Y, then [f xp dpdv, 
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If xp dv dp, and {xp d( x v) are all unequal. (To compute [xp d(u x v) = 
u x v(D), go back to the definition of u x v.) 


47. Let X = Y be an uncountable linearly ordered set such that for each x € X, 
{y E€ X : y < x} is countable. (Example: the set of countable ordinals.) Let 
M = N be the o-algebra of countable or co-countable sets, and let y = v be defined 
on M by u(A) = 0 if A is countable and (A) = 1 if A is co-countable. Let 
E = {(z,y) € X x X : y < x}. Then Ez and E” are measurable for all x,y, 
and [f xz dudv and ff xz dv dy exist but are not equal. (If one believes in the 
continuum hypothesis, one can take X = (0, 1] [with a nonstandard ordering] and 
thus obtain a set Æ C (0, 1]? such that Æ, is countable and EY is co-countable [in 
particular, Borel] for all x, y, but & is not Lebesgue measurable.) 


48. Let X = Y = N, M = N = P(N), u = v = counting measure. Define 
f(m,n) =1lifm=n, f(m,n) = —lifm =n + 1, and f(m,n) = 0 otherwise. 
Then f |f|d(~ x v) =o, and ff f du dv and ff f dv dp exist and are unequal. 


49. Prove Theorem 2.39 by using Theorem 2.37 and Proposition 2.12 together with 
the following lemmas. 
a. If E € Mx Nand px v(E) = 0, then v( Ez) = (EY) = 0 for a.e. x and y. 
b. If f is £-measurable and f = 0 A-a.e., then fz and f” are integrable for a.e. 
xand y, and f f, dv = f f”? dp = 0 for a.e. x and y. (Here the completeness of 
u and v is needed.) 


50. Suppose (X, M, p) is a o-finite measure space and f € L*(X). Let 
Gr = {(z,y) € X x [0,00] : y < f(z)}. 


Then Gy is M x Br-measurable and p x m(G;) = f f dy; the same is also true 
if the inequality y < f(x) in the definition of Gy is replaced by y < f(z). (To 
show measurability of G ¢, note that the map (x,y) +> f(x) — y is the composition 
of (x,y) + (f(x), y) and (z,y) + z — y.) This is the definitive statement of the 
familiar theorem from calculus, “the integral of a function is the area under its graph.” 


51. Let (X,M, p) and (Y,N,v) be arbitrary measure spaces (not necessarily o- 
finite). 
a. If f : X — C is M-measurable, g : Y — C is N-measurable, and h(x, y) = 
f(x)g(y), then h is M @ N-measurable. 
b. If f € L'(u) and g € Li(v), then h € Di(u x v) and fhd(u x v) = 
[S f duly g dv}. 
52. The Fubini-Tonelli theorem is valid when (X, M, 2) is an arbitrary measure 
space and Y is a countable set, N = P(Y), and v is counting measure on Y. (Cf. 
Theorems 2.15 and 2.25.) 
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2.6 THE n-DIMENSIONAL LEBESGUE INTEGRAL 


Lebesgue measure m” on R” is the completion of the n-fold product of Lebesgue 
measure on R with itself, that is, the completion of m x --- x mon BR ®@:--® Br = 
Brn, or equivalently the completion of m x --- x mon£L®---@&. The domain 
D” of m” is the class of Lebesgue measurable sets in R”; sometimes we shall also 
consider m” as a measure on the smaller domain Brn. When there is no danger of 
confusion, we shall usually omit the superscript n and write m for m”, and as in the 
case n = 1, we shall usually write f f(x) dz for f f dm. 

We begin by establishing the extensions of some of the results in §1.5 to the 
n-dimensional case. In what follows, if = IL; E; is a rectangle in R”, we shall 
refer to the sets &; C R as the sides of E. 


2.40 Theorem. Suppose E € L”. 
a. m(E) =inf{m(U): UDE, U open} = sup{m(K): KCE, K compact}. 
b. E = Ay UN, = A \ No where A, is an Fy set, Ag is a Gs set, and 
m( N1) = m(No2) = 0. 
c. If m(E) < œ, for any € > 0 there is a finite collection {R;}1. of disjoint 
rectangles whose sides are intervals such that m(EA Uy Rj) <€. 


Proof. By the definition of product measures, if & € £” and e > O there is 
a countable family {T;} of rectangles such that E C UY T; and X7 m(T;) < 
m(E) +e. For each j, by applying Theorem 1.18 to the sides of R; we can find 
a rectangle U; > F; whose sides are open sets such that m(U;) < m(T;) + e27. 
If U = UP Uj, then U is open and m(U) < SoS? m(U;) < m(E) + 2e. This 
proves the first equation in part (a); the second one, and part (b), then follow as in the 
proofs of Theorems 1.18 and 1.19. Next, if m( E) < oo, then m(U;) < oo for all j. 
Since the sides of U; are countable unions of open intervals, by taking suitable finite 
subunions we obtain rectangles V; C U; whose sides are finite unions of intervals 
such that m(V;) > m(U;) — «2-4. If N is sufficiently large, then, we have 


m(E\UY;) <m(UU v) +m(U U;) < 2e 
1 1 N+1 


and 
m(UY, \ E) < m(Uu; \ E) <€, 


so that m( EA U V;) < 3e. Since LJ V; can be expressed as a finite disjoint union 
of rectangles whose sides are intervals, we have proved (c). E 


2.41 Theorem. Iff € L! (m) and € > 0, there is a simple function ¢ = Da Qj XR;» 
where each R; is a product of intervals, such that f |f — ¢| < €, and there is a 
continuous function g that vanishes outside a bounded set such that f |f — g| < €. 
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Proof. As in the proof of Theorem 2.26, approximate f by simple functions, 
then use Theorem 2.40c to approximate the latter by functions @ of the desired 
form. Finally, approximate such ¢’s by continuous functions by applying an obvious 
generalization of the argument in the proof of Theorem 2.26. E 


2.42 Theorem. Lebesgue measure is translation-invariant. More precisely, fora € 
R” define Ta : R” — R” by Ta(x) =z +a. 
a. If E € L”, then Ta(E) € L” and m(Ta(E)) = m(E). 
b. If f : R” — C is Lebesgue measurable, then so is f © Ta. Moreover, if either 
f >0orf € L! (m), then [(f oTa)dm = f f dm. 


Proof. Since Tą and its inverse T—a are continuous, they preserve the class of 
Borel sets. The formula m(7,(/)) = m( E) follows easily from the one-dimensional 
result (Theorem 1.21) if Æ is a rectangle, and it then follows for general Borel sets 
since m is determined by its action on rectangles (the uniqueness in Theorem 1.14). 
In particular, the collection of Borel sets Æ such that m( E) = 0 is invariant under 
Ta. Assertion (a) now follows immediately. 

If f is Lebesgue measurable and B is a Borel set in C, we have f—!(B) = EUN 
where E is Borel and m(N) = 0. But 771(£) is Borel and m(771(N)) = 0, so 
(f o7,)~*(B) € £” and f is Lebesgue measurable. The equality f (f o Ta) du = 
f f dp reduces to the equality m(T-a(E)) = m(E) when f = xz. Itis then true for 
simple functions by linearity, and hence for nonnegative measurable functions by the 
definition of the integral. Taking positive and negative parts of real and imaginary 
parts then yields the result for f € L!(m). E 


Let us now compare Lebesgue measure on R” to the more naive theory of n- 
dimensional measure usually found in advanced calculus books. In this discussion, 
a cube in R” is a Cartesian product of n closed intervals whose side lengths are all 
equal. 

For k € Z, let Q% be the collection of cubes whose side length is 27} and whose 
vertices are in the lattice (2-*Z)”. (That is, []7'[a;,0;] € Qx iff 2*a; and 2*b; are 
integers and b; —a; = 2—-* for all j.) Note that any two cubes in Q% have disjoint 
interiors, and that the cubes in Q+: are obtained from the cubes in Q; by bisecting 
the sides. 

If & C R”, we define the inner and outer approximations to Æ by the grid of 
cubes Q; to be 


A(E,k) =|(J{QE2.:QCE}, A(E,k)=(J{Qe%: QNE # ø}. 


(See Figure 2.2.) The measure of A( E, k) (in either the naive geometric sense or the 
Lebesgue sense) is just 2~”* times the number of cubes in Qx that lie in A(E,k), 
and we denote it by m(A(E, k)); likewise for m(A(E,k)). Also, the sets A(E, k) 
increase with k while the sets A(E, k) decrease, because each cube in Q, is a union 
of cubes in 2,44. Hence the limits 

k(E) = Jim m(A(E,k)), K(E) = lim m(A(E,k)) 


— 0O 
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Fig. 2.2 Approximations to the inner and outer content of a set. 


exist. They are called the inner and outer content of /, and if they are equal, their 
common value «(£) is the Jordan content of E. 

Two comments: First, Jordan content is usually defined using general rectangles 
whose sides are intervals rather than our dyadic cubes, but the result is the same. 
Second, although all the definitons above make sense for arbitrary & C R”, the 
theory of Jordan content is meaningful only if Æ is bounded, for otherwise R(E) 
always equals oo. 

Let 


Then A(E) C E C A(E), A(E) and A(E) are Borel sets, and (E) = m(A(E)) 
and R(E) = m(A(£)). Thus the Jordan content of E exists iff m(A(E£)\ A(E)) = 0, 
which implies that £ is Lebesgue measurable and m( E) = K(E). 

To clarify further the relationship between Lebesgue measure and the approxi- 
mation process leading to Jordan content, we establish the following lemma. (The 


second part of the lemma will be used later.) 


2.43 Lemma. Jf U C R” is open, then U = A(U). Moreover, U is a countable 
union of cubes with disjoint interiors. 


Proof. Ifx € U, let é = inf{|y—z| : y U}, which is positive since U is open. 
If Q is a cube in Q% that contains zx, then every y € Q is at a distance at most 274 y/n 
from z (the worst case being when |z; — y;| = 2—-* for all j), so we will have Q C U 
provided k is large enough so that 2-*,/n < 6. But then x € A(U,k) C A(U). 

This shows that A(U) = U, and the second assertion follows by writing A(U) = 
A(U,0) U UZ [A(U, k) \ A(U, k — 1]. A(U, 0) is a (countable) union of cubes in 
Qo, and for k > 1, the closure of A(U,k) \ A(U, k — 1) is a (countable) union of 
cubes in Q;. These cubes all have disjoint interiors, and the result follows. E 
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Lemma 2.43 immediately implies that the Lebesgue measure of any open set is 
equal to its inner content. On the other hand, suppose that F C R” is compact. 
We can find a large cube, say Qo = {x : max|z,| < 2M}, whose interior int(Qo) 
contains F. If Q € Qk and Q C Qo then either Q N F 4 Ø or Q C (Qo \ F), so 
m(A(F,k)) + m(A(Qo \ F,k)) = m(Qo). Letting k — 00, we see that (F) + 
K(Qo \ F) = m(Qo). But Qo \ F is the union of the open set int(Qo) \ F and the 
boundary of Qo, which has content zero, so that K(Qo \ F) = K(int(Qo) \ F) = 
m(Qo \ F). It follows that the Lebesgue measure of any compact set is equal to its 
outer content. 

Combining these results with Theorem 2.40a, we can see exactly how Lebesgue 
measure compares to Jordan content. The Jordan content of F is defined by approx- 
imating E from the inside and the outside by finite unions of cubes. The Lebesgue 
measure of Æ, on the other hand, is given by a two-step approximation process: 
First one approximates - from the outside by open sets and from the inside by 
compact sets, and then approximates the open sets from the inside and the compact 
sets from the outside by finite unions of cubes. The Lebesgue measurable sets are 
precisely those for which these outer-inner and inner-outer approximations give the 
same answer in the limit. (Cf. Exercise 19 in §1.4.) 





We now investigate the behavior of the Lebesgue integral under linear transfor- 
mations. We identify a linear map T : R” — R” with the matrix (T;;) = (e; - Tej) 
where {e,;} is the standard basis for R”. We denote the determinant of this matrix 
by det T and recall that det(T o S) = (det T)(det S). Furthermore, we employ the 
standard notation GL(n, R) (the “general linear” group) for the group of invertible 
linear transformations of R”. We shall need the fact from elementary linear algebra 
that every T € GL(n,R) can be written as the product of finitely many transfor- 
mations of three “elementary” types. The first type multiplies one coordinate by a 
nonzero constant c and leaves the others fixed; the second type adds a multiple of one 
coordinate to some other coordinate and leaves all but the latter fixed; the third type 
interchanges two coordinates and leaves the others fixed. In symbols: 


Dy ee EN n] nl aa ree (c #0), 
Tal tieto a) = (en a CL En) (he 4). 
IE fated EN pinay ie ee beta E 
That every invertible transformation is a product of transformations of these three 
types is simply the fact that every nonsingular matrix can be row-reduced to the 
identity matrix. 
2.44 Theorem. Suppose T € GL(n, R). 
a. If f is a Lebesgue measurable function on R”, so is f oT. If f > O0or 


f € Li (m), then 
(2.45) fro) dx = | det T| J foT(a) dz. 


b. IfE € L”, then T(E) € L” and m(T(E)) = | det T|\m(£). 
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Proof. First suppose that f is Borel measurable. Then f oT is Borel measurable 
since T is continuous. If (2.45) is true for the transformations T and S, it is also true 
for T o S, since 


J to de = det T] | f oT(@)de = [det T] |det S| | (f o T) o S(2) da 
= ldet(T' o 8)| | fo (P.°8)(2) dz. 


Hence is suffices to prove (2.45) when T is of the types 7}, Zo, T3 described above. 
But this is asimple consequence of the Fubini-Tonelli theorem. For 73 we interchange 
the order of integration in the variables x; and zx, and for T; and Tz we integrate 
first with respect to x; and use the one-dimensional formulas 


[1 dt = el | F(ctyat, [se+ayae= | seat, 


which follow from Theorem 1.21. Since it is easily verified that det T} = c, det To = 
1, and det T3 = —1, (2.45) is proved. Moreover, if E is a Borel set, so is T(E) (since 
T~* is continuous), and by taking f = xr(g), we obtain m(T(E)) = | det T|m(E£). 
In particular, the class of Borel null sets is invariant under T and T7}, and hence so 
is &”. The result for Lebesgue measurable functions and sets now follows as in the 
proof of Theorem 2.42. E 


2.46 Corollary. Lebesgue measure is invariant under rotations. 


Proof. Rotations are linear maps satisfying TT* = I where T™* is the transpose 
of T. Since det T = det T*, this condition implies that | det T| = 1. g 


Next we shall generalize Theorem 2.44 to differentiable maps. This result will 
not be used elsewhere in this book and may be omitted on a first reading. We shall 
prove a generalization of it, by somewhat different methods, in §11.2. 

Let G = (g1,..-,9n) be a map from an open set Q C R” into R” whose 
components g; are of class C+, i.e., have continuous first-order partial derivatives. 
We denote by D,G the linear map defined by the matrix ((0g;/ Ox,;)(x)) of partial 
derivatives at x. (Observe that if G is linear, then D,G = G for all x.) G is called 
a C1 diffeomorphism if G is injective and D,G is invertible for all x € Q. In this 
case, the inverse function theorem guarantees that G71} : G(Q) — Q is also a C! 
diffeomorphism and that D,(G~*) = [Dg-1(2)G]~* for all x € G(Q). 


2.47 Theorem. Suppose that Q is an open set in R” and G : Q — R” isa C! 
diffeomorphism. 
a. If f is a Lebesgue measurable function on G(Q), then f o G is Lebesgue 


measurable on Q. If f > Oor f € L1(G(Q),m), then 
J f(x) dz = J f o G(x)| det DG] dz. 
G(Q) Q 


b. IfE C Qand E € £”, then G(E) € L” and m(G(E)) = fp |det DrG| dz. 
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Proof. It suffices to consider Borel measurable functions and sets. Since G and 
Gt are both continuous, there are no measurability problems in this case, and the 
general case follows as in the proof of Theorem 2.42. 

A bit of notation: For x € R” and T = (T;;) E GL(n,R), we set 


n 
lel = gx le; ITI = pax J Tal 
jI= 


We then have ||T’z|| < ||T|| ||z||, and {x : || — a|| < h} is the cube of side length 
2h centered at a. 

Let Q be a cube in Q, say Q = {z: ||x — a|| < h}. By the mean value theorem, 
9; () — 9;(a) = D0, (x; — a;)(Og/Ox;)(y) for some y on the line segment joning 
x and a, so that for x € Q, ||G(x) — G(a)|| < h(supyeg ||DyG||). In other words, 
G(Q) is contained in a cube of side length sup, ¢g ||DyG|| times that of Q, so that 
by Theorem 2.44, m(G(Q)) < (supyeg || DyGl|)"™(Q). If T € GL(n, R), we can 
apply this formula with G replaced by T7} o G together with Theorem 2.44 to obtain 








m(G(Q)) = | det T|m(T-*(G(Q))) 


(2.48) < | det T] (sup |IT-D, Gl) mQ). 


Since D,G is continuous in y, for any €e > 0O we can choose 6 > 0 so that 
||(D-G)"'D,G||" < 1 +e if y,z € Q and ||y — z|| < 6. Let us now subdivide 
Q into subcubes Q1,...@y whose interiors are disjoint, whose side lengths are at 
most 6, and whose centers are x1,... xp. Applying (2.48) with Q replaced by Q; 
and with T = Dz, G, we obtain 


m(G(Q)) < 5 m(G(Q;)) 


N n 
< 2 | det Da,G\( sup |(De;G)“1DyGl|) m(Q;) 


N 


< (1 +€) X | det Dz, Glm(Q;). 
1 


This last sum is the integral of sy | det Dz, G|xQ; (x), which tends uniformly on 
Q to | det D,G| as 6 — 0 since D,G is continuous. Thus, letting 6 — 0 and € — O, 
we find that 


m(G(Q)) < | | det Dz G| dz. 


We claim that this estimate holds with Q replaced by any Borel set in Q. Indeed, if 
U C Qis open, by Lemma 2.43 we can write U = |J] Q; where the Q;’s are cubes 
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with disjoint interiors. Since the boundaries of the cubes have Lebesgue measure 
zero, we have 


CO 


m(G(U)) < X` m(G(Q;)) < D | det DzG| dz = : | det D,G| dz. 


1 


Moreover, if & C Q is any Borel set of finite measure, by Theorem 2.40 there is a 
decreasing sequence of open sets U; C Q of finite measure such that E C QY U; 
and m(();~ U; \ E) = 0. Hence by the dominated convergence theorem, 


CO 


m(G(E)) < m(G(() U;) = limm(G(U;)) 


1 


<lim f [det D.G|de = | | det D,G| dz. 
U; E 


Finally, since m is ø-finite, it follows from this that m(G(E)) < fp |det D,G| dz 
for any Borel set E C Q. 
If f = $ a;xa, is a nonnegative simple function on G(Q), we therefore have 


f(x) dx = S| ajm(A;) < > a | | det D, G| dx 
G-1(A;) 


G(Q) 


= l f o G(x)| det DzG| dz. 
Q 
Theorem 2.10 and the monotone convergence theorem then imply that 
[ f(x) dx < J f o G(x)| det D} G| dz 
G(Q) Q 


for any nonnegative measurable f. But the same reasoning applies with G replaced 
by G-t} and f replaced by f o G, so that 


J f o G(x)| det DzG| dz 
Q 


< J f o G o G7} (x)| det Dg-1(z)G| | det D,G"'|dz = / f(x) dz. 
G(Q)) G(Q) 


This establishes (a) for f > 0, and the case f € L! follows immediately. Since (b) 
is just the special case of (a) where f = Xqg), the proof is complete. E 


Exercises 
53. Fill in the details of the proof of Theorem 2.41. 


54. How much of Theorem 2.44 remains valid if T is not invertible? 
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55. Let E = [0,1] x (0, 1; pee the existence and equality of fp f dm?, 
ih ema x, y) dx dy, and a Pen x, y) dy dz for the following f. 

a. f(x,y) = (z? — y*)(a? + y?)-?. 

b. f(x,y) = (1- a (a > 0). 

c f(z,y) = (c— §) 8 if0<y<|x—- il, y) = 0 otherwise. 
56. If f is Lebesgue a on (0, “3 and g(x) = f° t~'f(t)dt, then g is 
integrable on (0, a) and fọ g(x) da = fy f(a) de. 


57. Show that fy e~S*a~!sinadx = arctan(s~') for s > 0 by integrating 
e~**¥ sin x with respect to x and y. (It may be useful to recall that tan(> — @) = 
(tan 0)~+. Cf. Exercise 31d.) 


58. Show that f e~s*z—lsin*? rdr = + log(1 + 4s~*) for s > 0 by integrating 
e~ ** sin 2xy with respect to x and y. 
59. Let f(x) = a7! sinz. 

a. Show that [° |f(x)|dz = 


b. Show that limy—oo me x) dx = 4m by integrating e~*Y sin x with respect 
to x and y. (In view of part ii care is needed in passing to the limit as 
b — oo.) 


60. C(x)P(y)/T(2@t+ y) = fe t7-1(1 — t)¥—! dt for x,y > 0. (Recall that r was 
defined in §2.3. Write ['(x)I'(y) as a double integral and use the argument of the 
exponential as a new variable of integration.) 


61. If f is continuous on (0, co), fora > 0 and z > 0 let 


Inf 2) = e | (et)?! f(t) dt 


Ia f is called the ath fractional integral of f. 
a. Ia+6f = Ia(Ief) forall a, 8 > 0. (Use Exercise 60.) 
b. Ifn € N, Inf is an nth-order antiderivative of f. 


2.7 INTEGRATION IN POLAR COORDINATES 


The most important nonlinear coordinate systems in R? and R are polar coor- 
dinates (x = rcos@, y = rsin) and spherical coordinates (x = rsin ¢cos8@, 
y = rsin ġsin 0, z = rcos¢). Theorem 2.47, applied to these coordinates, yields the 
familiar formulas (loosely stated) dz dy = r dr d0 and dz dy dz = r? sin ¢ dr dé dd. 
Similar coordinate systems exist in higher dimensions, but they become increasingly 
complicated as the dimension increases. (See Exercise 65.) For most purposes, 
however, it is sufficient to know that Lebesgue measure is effectively the product of 
the measure r”—! dr on (0, 00) and a certain “surface measure” on the unit sphere 
(dð for n = 2, sin ddd dọ for n = 3). 
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Our construction of this surface measure is motivated by a familiar fact from plane 
geometry. Namely, if Sg is a sector of a disc of radius r with central angle 0 (i.e., 
the region in the disc contained between the two sides of the angle), the area m( Sg) 
is proportional to 8; in fact, m( Sẹ) = 5770. This equation can be solved for 0 and 
hence used to define the angular measure @ in terms of the area m( Sẹ). The same 
idea works in higher dimensions: We shall define the surface measure of a subset of 
the unit sphere in terms of the Lebesgue measure of the corresponding sector of the 
unit ball. 

We shall denote the unit sphere {x € R” : |x| = 1} by S"—!. If x € R” \ {0}, 
the polar coordinates of x are 


r = |z| € (0, 00), ge gra, 
jæ] 
The map (x) = (r, x’) is a continuous bijection from R” \ {0} to (0,00) x S"~1 
whose (continuous) inverse is ®~1(r,z’) = ra’. We denote by m, the Borel 
measure on (0,00) x S"~! induced by ® from Lebesgue measure on R”, that is, 
m,(E) = m(®7!(E)). Moreover, we define the measure p = pn on (0,00) by 
105 en Jar dr 


2.49 Theorem. There is a unique Borel measure o = On_1 on S”-! such that 
M, =p xo. If f is Borel measurable on R” and f > Oor f € I} (m), then 


(2.50) i fajdr= [ J f(ra’\r"" do(zx')dr. 


Proof. Equation (2.50), when f is a characteristic function of a set, is merely a 
restatement of the equation m, = p x a, and it follows for general f by the usual 
linearity and approximation arguments. Hence we need only to construct ø. 

If E is a Borel set in S”~!, for a > 0 let 


Eq = ®-*((0,a] x E)={re':0<r<a,z'e E}. 
If (2.50) is to hold when f = yz,, we must have 


m(Ex) = f f P= do(a!yar = o(B) f star — 


We therefore define o( E) to be n - m( E1). Since the map E +> F; takes Borel sets 
to Borel sets and commutes with unions, intersections, and complements, it is clear 
that o is a Borel measure on §”71, Also, since E, is the image of £; under the 
map x +> az, it follows from Theorem 2.44 that m(E,) = a”m( E1), and hence, if 
O0<a<b, 


b 


o(E) = o(E) I Ey 


a 


prag" 





M, ((a, b] x E) = m(Er \ Ea) = 
= p x o ( (a,b) x E). 
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Fix & € Bogn-1 and let Ag be the collection of finite disjoint unions of sets of the 
form (a, b] x E. By Proposition 1.7, Ag is an algebra on (0, co) x E that generates 
the o-algebra Mg = {A x E : A € Boooy}. By the preceding calculation we 
have Mm, = p x o on Ag, and hence by the uniqueness assertion of Theorem 
1.14, m, = p X o on Mpg. But {Me: E E€ Bgn-:} is precisely the set of Borel 
rectangles in (0, co) x S"~', so another application of the uniqueness theorem shows 
that m, = p x o on all Borel sets. E 


Of course, (2.50) can be extended to Lebesgue measurable functions by consider- 
ing the completion of the measure ø. Details are left to the reader. 


2.51 Corollary. If f is a measurable function on R”, nonnegative or integrable, 
such that f(x) = g(|x|) for some function g on (0, œ0), then 


J f(x) dz = o(S"“") | i g(r)r”™? dr. 


2.52 Corollary. Let c and C denote positive constants, and let B = {x € R” : 
|z| < c}. Suppose that f is a measurable function on R”. 
a. If |f(x)| < Clz|~* on B for some a < n, then f € L! (B). However, if 
|f(x)| > Clz|-” on B, then f ¢ L! (B). 
b. If |f(x)| < Clx|~* on BS for some a > n, then f € L!(B°). However, if 
|f(x)| > Clz|~” on B®, then f ¢ L! (B°). 


Proof. Apply Corollary 2.51 to |z|~*yg and |z|7°xpe. E 


We shall compute o(S"~—') shortly. Of course, we know that o (S1) = 27; this 
is just the definition of 27 as the ratio of the circumference of a circle to its radius. 
Armed with this fact, we can compute a very important integral. 


2.53 Proposition. [fa > 0, 


L exp(—a|z|?) dx = oa 


a 


Proof. Denote the integral on the left by In. For n = 2, by Corollary 2.51 we 


have 
CO 
2 T EEPE a kea 
Is = 2r reo dr=-(=)e a, 
0 a 0 a 


T 





Since exp(—a|z|*) = [[] exp(—az?), Tonelli’s theorem implies that In = (11)”. 
In particular, Jy = (I2)!/2, so In = (Ip)”/? = (T/a). E 


Once we know this result, the device used in its proof can be turned around to 
compute o(S”~') for all n in terms of the gamma function introduced in §2.3. 


Jyr/2 
2.54 Proposition. oS) = T(n/2) 
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Proof. By Corollary 2.51, Proposition 2.53, and the substitution s = r?, 
E -lel? dg = a(S?) / pr—le-t dp 
eS 0 
n—-1 o9 n—1 
_ a(S") l D-1- gs — S) p (=) l 
2 0 2 2 
E 
n/2 
2.55 Corollary. If B” = {x ER”: <1}, then m( B”) = ——. 
Proof. m(B”) = n~'o(S"~") by definition of o, and ¿nT ($n) = T(n +1) 
by the functional equation for the gamma function. E 


We observed in §2.3 that T(n) = (n — 1)!. Now we can also evaluate the gamma 
function at the half-integers: 


2.56 Proposition. T(n + $) = (n — §)(n — 2) --- (4) Vo. 


Proof. We have T(n + $) = (n — $)(n — $)---($)I(4) by the functional 


equation, and by Proposition 2.53 and the substitution s = r*, 


[G)= / s71/2e7® ds = 2 | e7" dr = e7” dr = VT. 
0 —0o 
E 


An amusing consequence of Proposition 2.56 and the formula T(n) = (n — 1)! 
is that the surface measure of the unit sphere and the Lebesgue measure of the unit 
ball in R” are always rational multiples of integer powers of 7, and the power of 7 
increases by 1 when n increases by 2. 


Exercises 


62. The measure ø on $”~! is invariant under rotations. 


63. The technique used to prove Proposition 2.54 can also be used to integrate any 
polynomial over S"~!. In fact, suppose f(z) = |]; g (a; € NU {0} is a 
monomial. Then f f do = Oif any a; is odd, and if all a;’s are even, 


2. ( (Bi): r(Bn) TOR Œj + 1 
pesn Cae a 


64. For which real values of a and b is |z|°|log |z| | integrable over {x € R” : 
|z| < 5}? Over {x € R” : |z| > 2}? 


65. Define G : R” — R” by G(r, ġ1,..., dn—2,9) = (21,.-.,;2n) where 





zı =7rcos¢?;, T2 = rsin ġı cos, z3 = rsin ġı sin z2 cos ġ3,..., 


LIn-1 = rsin ġı -SİN n-2 c08, Tn = r sin ġı -sin Qn- sin ð. 
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a. G maps R” onto R”, and |G(r, ġ1,..., dn—2, 9)| = |r]. 

Pe bn—2,0)G = r?—l gin”? Pi sin”? Q2- SiN dn_o. 

c. Let Q = (0,00) x (0,7)"~2 x (0,27). Then G|Q is a diffeomorphism and 
m(R” \ G(Q)) = 0. 

d. Let F'(¢1,-.-,¢n-2,0) = G(1,¢1,..-,;¢n—2,9) and Q = (0,7)"~? x 
(0, 277). Then (F'|Q’)—+ defines a coordinate system on S”~+ except on a o-null 
set, and the measure ø is given in these coordinates by 


do(¢1,...¢n—2,9) = sin”? dı sin”? d2- ++ sin¢dn_2 dd, ---dbn_2 dd. 


2.8 NOTES AND REFERENCES 


The history of modern measure and integration theory can fairly be said to have 
begun with the publication of Lebesgue’s thesis [91] in 1902, although of course 
Lebesgue was building on earlier works of other mathematicians, and some of his 
results were obtained independently by Vitali and W. H. Young. The theory of the 
Lebesgue integral was extensively developed by a number of mathematicians in the 
ensuing decade, during which time most of the results in this chapter were first 
derived. In particular, Lebesgue himself proved the dominated convergence theorem 
and deduced the monotone convergence theorem from it in the case when the limit 
function f is integrable; when f f = oo the latter theorem is due to B. Levi. 

Lebesgue [92] studied more general measures on R” (which he called “additive 
set functions”) in connection with the problem of generalizing the notion of indefinite 
integrals to functions of several variables. Radon [111] then developed the theory of 
integration with respect to what we now call regular Borel measures on R”, which 
in particular yields the Lebesgue-Stieltjes integrals when n = 1. Finally, in 1915 
Fréchet [53] pointed out that many of Radon’s ideas would work in the general setting 
of sets equipped with o-algebras. Thus was abstract measure and integration theory 
born. It continued to develop until, by about 1950, it had assumed more or less the 
form in which we know it today. The first systematic modern treatise on the subject 
is Halmos [62]. 

For accounts of the prehistory and early history of the Lebesgue integral, see 
Hawkins [70]. References concerning the later development of the subject can be 
found in Saks [128] and Hahn and Rosenthal [61]. 

We have adopted the point of view of beginning with measures and deriving 
integrals from them. However, it is also possible to go the other way, a procedure 
first developed by Daniell [29]. Roughly speaking, one starts with an “elementary 
integral”: a linear functional J defined on a suitable space of functions that satisfies 
some mild continuity conditions and is positive in the sense that J(f) > 0 whenever 
f = 0 (for example, the Riemann integral on the space of continuous functions on 
(a, b]). The Daniell theory provides an extension of J to a functional I defined ona 
larger class of functions. Under appropriate hypotheses, the collection M of sets £ 
such that yz is in the domain of T is then ac-algebra, the function u( E) = I(x) isa 
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measure on M, and T is integration with respect to u. See Royden [121] for a concise 
account of the Daniell theory and Pfeffer [108] for a comprehensive treatment, as 
well as König [86] for a somewhat different approach. 

The Lebesgue theory is not the last word regarding integration on R. Motivated 
partly by the problem of establishing the fundamental theorem of calculus in the 
greatest possible generality (about which we shall say more in §3.6), a number of 
theories of integration have been developed that include not only the Lebesgue integral 
but also certain “conditionally convergent” integrals. That is, they assign a meaning to 
f f(x) dx for certain measurable functions f : R — R such that f ft = f fT = œ, 
but for which the cancellation of positive and negative values in some way yields a 
reasonable definition of f f(x)dx. (A standard example is f(z) = x7! sin z; see 
Exercise 59.) The first procedures for defining such integrals, due to Denjoy and 
Perron, were quite complicated. However, in the late 1950s, Henstock and Kurzweil 
independently discovered a modification of the classical Riemann integral that yields 
the same results. 

The Henstock-Kurzweil integral on a bounded interval [a, b] is defined as follows. 
A tagged partition of [a,b] is a finite sequence {r;}{ such that a = £o < <- < 
xn = b (i.e., a partition in the sense of §2.3) together with another finite sequence 
{t;} such that t; € [x;-1,2;]. A gauge on [a,b] is an (arbitrary!) function 
ô : [a,b] — (0,00). If P is a tagged partition and 6 is a gauge, P is called 6-fine if 
Lj — Tj—ı < 6(t;) for all j. The compactness of [a, b] easily implies that for any 
gauge ô there is a 6-fine tagged partition of [a, b]. 

Now suppose f is a real-valued function on [a,b]. If P is a tagged partition of 
[a, b], the corresponding Riemann sum for f is Upf = By f(t;)(2j — 23-1). The 
function f is called Henstock-Kurzweil integrable on [a, b} if there exists c € R 
with the following property: For any e€ > 0 there is a gauge 6, such that if P is any 
6,-fine tagged partition of [a, b], then |X pf — c| < e. In this case the number c is 
unique, and it is called the Henstock-Kurzweil integral of f. The ordinary Riemann 
integral of f, in contrast, can be defined in exactly the same way except that one 
allows only constant gauges. 

It turns out that the Henstock-Kurzweil integral coincides with the integrals of 
Denjoy and Perron. In particular, it coincides with the Lebesgue integral for nonneg- 
ative functions, but its domain includes many functions that have both positive and 
negative values and are not in L}([a, b]). The definition of the Henstock-Kurzweil 
integral is easily extended to unbounded intervals. It also admits an n-dimensional 
version: One simply defines an 7n-interval to be a product of n one-dimensional 
intervals and a tagged partition of an n-interval J to be a finite collection {J;} of 
n-intervals with disjoint interiors whose union is J together with a choice of t; € I; 
for each 7; the definition of the integral then proceeds as above. 

A good case can be made that the Henstock-Kurzweil integral ought to be the theory 
of integration on R” that is generally taught to students, not just because of its added 
generality but (more cogently) because its definition is relatively simple and requires 
no measure theory to get started. On the other hand, it does not generalize as readily 
to spaces other than IR”, and although it can be developed in a rather abstract setting, 
it loses much of its appealing simplicity there. Moreover, although conditionally 
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convergent integrals that cannot be obtained by a simple limiting procedure from 
absolutely convergent ones do turn up now and then in certain problems, their utility 
is not sufficently broad to make a compelling case for their study by nonspecialists. 

In any case, in this book we shall content ourselves with the Lebesgue integral 
and the general theory of measure and integration of which it is a part. Readers who 
wish to learn more about the Henstock-Kurzweil integral can find a brief introduction 
in Bartle [13] and detailed treatments in McLeod [99] and Pfeffer [109]. See also 
Gordon [57] for a comprehensive account of the Denjoy, Perron, and Henstock- 
Kurzweil integrals on [a, b], and Henstock [72] for a development of the theory in a 
more abstract setting. 


§2.1: A Borel isomorphism between two measurable spaces (X, M) and (Y, N) 
is a bijection f : X — Y such that fT} is a bijection from N to M. Unlike the 
related notion of homeomorphism for topological spaces (see Chapter 4) and notions 
of isomorphism in various other categories, the notion of Borel isomorphism is of 
limited utility, because it is too easy for two spaces to be Borel isomorphic. That 
this is so is clearly indicated by the single major theorem in the subject, due to 
Kuratowski: 


Suppose that (X, M) is Borel isomorphic to a Borel subset E of a complete 
separable metric space Y (equipped with the o-algebra {F € By : Fc Ep). 
Then either X is countable and M = P(X), or X is Borel isomorphic to 
(R, Br). 


A proof of this theorem, as well as much additional information about Borel sets, can 
be found in Srivastava [139]. 

There is a hierarchy of Borel measurable functions on a metric space that corre- 
sponds roughly to the hierarchy of Borel sets (open and closed, fF, and Gs, etc.). 
Namely, let Bo be the space of all continuous functions, and for each countable 
ordinal a define Ba recursively as follows. If œ has an immediate predecessor 
B, Ba is the set of all limits of pointwise convergent sequences in Bg; otherwise, 
Bai) oe Bg. Functions in Ba are said to be of Baire class a. For example, if f 
is everywhere differentiable on R, f’ is of Baire class 1. 

Exercise 11 is a result from Lebesgue’s first published paper. See Rudin [123] for 
a discussion of it. 


§2.3: The blurring of the distinction between individual measurable functions 
and equivalence classes of functions defined by almost-everywhere equality is often 
convenient and rarely disastrous. The most common situations where some care is 
needed involve the interplay of measurable and continuous functions (on R”, say), 
for a function that is equal a.e. to a continuous function will not be continuous in 
general. See Zaanen [165] for a careful discussion of this point. 


82.4: An interesting discussion of Egoroff’s theorem, including some necessary 
and sufficient conditions for almost uniform convergence, can be found in Bartle 
[12]. For a simple proof of Lusin’s theorem (Exercise 44) that does not depend on 
Egoroff’s theorem, see Feldman [43]. We shall prove a more general form of this 
theorem in §7.2. 
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§2.5: The original theorems of Fubini and Tonelli pertained to Lebesgue measure 
in the plane. The theory of abstract product measures was developed independently 
by several people in the 1930s; the construction of u x v presented here is that of 
Hahn [60]. It is also possible to define a product measure on the product of an infinite 
family { (Xa, Ma, Ua) } aca Of measure spaces provided that pa (Xa) = 1 for all but 
finitely many a; see Saeki [127], Halmos [62, §38], or Hewitt and Stromberg [76, 
§22]. We shall present a version of this result in §7.4 (Theorem 7.28). 

Using the axiom of choice but not the continuum hypothesis, Sierpinski [134] has 
proved the existence of a Lebesgue nonmeasurable subset of R? whose intersection 
with any straight line contains at most two points. This should be compared with 
Exercise 47 (which is also due to Sierpinski). 

The following generalization of the notion of product measures is useful in a 
number of situations: One is given a measurable space (X, M), a o-finite measure 
space (Y, N,v), and a family {u, : y € Y} of finite measures on X such that the 
function y +> uy(E) is measurable on Y for each E € M. One can then define a 
measure Aon X xY such that f f dà = ff f(a, y) duy(x) dv(y) for f e Lt(XxY). 
See Johnson [79]. 


§2.6: Our proof of Theorem 2.47 follows J. Schwartz [131]. This theorem can 
also be proved under slightly weaker hypotheses on the transformation G; see Rudin 
(125, Theorem 7.26]. 





Signed Measures and 
Differentiation 


The principal theme of this chapter is the concept of differentiating a measure v with 
respect to another measure jz on the same o-algebra. We do this first on the abstract 
level, then obtain a more refined result when u is Lebesgue measure on R”. When 
the latter is specialized to the case n = 1, it joins with classical real-variable theory 
to produce a version of the fundamental theorem of calculus for Lebesgue integrals. 

In developing this program it is useful to generalize the notion of measure so as to 
allow measures to assume negative or even complex values. There are three reasons 
for this. First, in applications such “signed measures” can represent things such 
as electric charge that can be either positive or negative. Second, the differentiation 
theory proceeds more naturally in the more general seting. Finally, complex measures 
have a functional-analytic significance that will be explained in Chapter 7. 


3.1 SIGNED MEASURES 


Let (X, M) be a measurable space. A signed measure on (X,) is a function 
v : M — [—o0, co] such that 


e v(©) = 0; 

è v assumes at most one of the values too; 

e if {E;} is a sequence of disjoint sets in M, then v(UP E;) = X7 v(E;), 
where the latter sum converges absolutely if v(\U}° E;) is finite. 


Thus every measure is a signed measure; for emphasis we shall sometimes refer to 


measures as positive measures. 
85 
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Two examples of signed measures come readily to mind. First, if 41, 42 are 
measures on M and at least one of them is finite, then v = py — u2 is a signed 
measure. Second, if u is a measure on M and f : X — [—o0, oo] is a measurable 
function such that at least one of f ft du and f fT dy is finite (in which case we 
shall call f an extended -integrable function), then the set function v defined by 
v(E) = J, f du is a signed measure. In fact, we shall see shortly that these are 
really the only examples: Every signed measure can be represented in either of these 
two forms. 


3.1 Proposition. Let v be a signed measure on (X,M). If {E;} is an increasing 
sequence in M, then v(| J7 Ej) = limjsoo V(E;). If {E;} is a decreasing sequence 
in M and v( E1) is finite, then v((\P° Ej) = limjsoo V(E;). 


The proof is essentially the same as for positive measures (Theorem 1.8) and is 
left to the reader (Exercise 1). 

If v is a signed measure on (X, M), aset Æ € Mis called positive (resp. negative, 
null) for v if v(F’) > 0 (resp. v(F’) < 0, v(F) = 0) for all F € M such that F C E. 
(Thus, in the example v(E) = fp f du described above, E is positive, negative, or 
null precisely when f > 0, f < 0, or f = 0 p-a.e. on E.) 


3.2 Lemma. Any measurable subset of a positive set is positive, and the union of 
any countable family of positive sets is positive. 


Proof. The first assertion is obvious from the definition of positivity. If P,, Po,... 
are positive sets, let Qn = Ph \ eo P;. Then Qn C Py, so Qn is positive. Hence 
if E CUP P;, then v(E) = 977° v(E N Q4) 2 0, as desired. E 


3.3 The Hahn Decomposition Theorem. [fv is a signed measure on (X, M), there 
exist a positive set P anda negative set N for v such that PUN = X and POAN = Ø. 
If P', N’ is another such pair, then PAP’ (= NAN’) is null for v. 


Proof. Without loss of generality, we assume that v does not assume the value 
—oo. (Otherwise, consider —v.) Let m be the supremum of v( E) as E ranges over 
all positive sets; thus there is a sequence { P; } of positive sets such that v(P;) —> m. 
Let P = |JI Pj. By Lemma 3.2 and Proposition 3.1, P is positive and v(P) = m; 
in particular, m < oo. We claim that N = X \ P is negative. To this end, we assume 
that N is not negative and derive a contradiction. 

First, notice that N cannot contain any nonnull positive sets. Indeed, if E C N is 
positive and v(E) > 0, then EU P is positive and v(E U P) = v(E) + v(P) >m, 
which is impossible. 

Second, if A C N and v( A) > 0, there exists B C A with v( B) > v(A). Indeed, 
since A cannot be positive, there exists C C A with v(C') < 0; thus if B = A\ C 
we have v( B) = v(A) — v(C) > (A). 

If N is not negative, then, we can specify a sequence of subsets {A,} of N and 
a sequence {n, } of positive integers as follows: nı is the smallest integer for which 
there exists a set B C N with v(B) > nj", and A, is sucha set. Proceeding 
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inductively, n; is the smallest integer for which there exists a set B C Aj_, with 
v(B) > v(Aj-1) + a and A; is such a set. 

Let A = NF Aj. Then œ > v(A) = limv(A;) > OP oo so nj — oO 
as 7 — oo. But once again, there exists B C A with v(B) > v(A) + n7! for 
some integer n. For j sufficiently large we have n < nj, and B C Aj-1, which 
contradicts the construction of n; and A;. Thus the assumption that NV is not negative 
is untenable. 

Finally, if P’, N’ is another pair of sets as in the statement of the theorem, we 
have P \ P’ c P and P \ P’ C N’, so that P \ P’ is both positive and negative, 
hence null; likewise for P’ \ P. E 


The decomposition X = PU N if X as the disjoint union of a positive set and a 
negative set is called a Hahn decomposition for v. It is usually not unique (v-null 
sets can be transferred from P to N or from N to P), but it leads to a canonical 
representation of v as the difference of two positive measures. 

To state this result we need anew concept: We say that two signed measures jz and 
v on (X,M) are mutually singular, or that v is singular with respect to 4, or vice 
versa, if there exist E, F € M such that E N F = Ø, EU F = X, E is null for p, 
and F is null for v. Informally speaking, mutual singularity means that py and v “live 
on disjoint sets.” We express this relationship symbolically with the perpendicularity 
sign: 

Lv. 


3.4 The Jordan Decomposition Theorem. /f v is a signed measure, there exist 
unique positive measures vt and v~ such that v = vt —v~ and vt Lv-. 


Proof. Let X = P U N bea Hahn decomposition for v, and define vt (E) = 
v(E OP) and v7 (E) = —v(E NN). Then clearly v = vt — v7 andvt L v`. If 
alsov = ut — pu andpt L u`, let E, F € MbesuchthatENF = Ø, EUF = X, 
and ut (F) = u~ (E) = 0. Then X = E U F is another Hahn decomposition for v, 
so PAE is v-null. Therefore, for any A € M, pt (A) = + (AN E) = (AN E) = 
v( AN P) = vt (A), and likewise v~ = u”. E 


The measures v* and v~ are called the positive and negative variations of v, 
and v = vt — v” is called the Jordan decomposition of v, by analogy with the 
representation of a function of bounded variation on R as the difference of two 
increasing functions (see §3.5). Furthermore, we define the total variation of v to 
be the measure |v| defined by 


jy) = vt tou. 


It is easily verified that FE € M is v-null iff |\v|(E) = 0, and v L piff |v| L n iff 
vt L wandv~ L p (Exercise 2.) 

We observe that if v omits the value co then vt (X) = v(P) < œ, so that vt isa 
finite measure and v is bounded above by v* (X); similarly if v omits the value —oo. 
In particular, if the range of v is contained in R, then v is bounded. We observe also 
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that v is of the form v(E) = fp f du, where u = |v| and f = xp—xn, X = PUN 
being a Hahn decomposition for v. 

Integration with respect to a signed measure v is defined in the obvious way: We 
set 


Liv) = Liv) n L (v5), 


[ta [par - [fa (f e L'(v)). 


One more piece of terminology: a signed measure v is called finite (resp. o-finite) 
if |v] is finite (resp. o-finite). 


Exercises 


1. Prove Proposition 3.1. 


2. Ifv is asigned measure, F is v-null iff |v|(E) = 0. Also, if v and p are signed 
measures, v L piff |v| L wiffvt L pandy” Lp. 
3. Let bea signed measure on (X, M). 

a. L*(v) = L (|v). 

b. If f € L’ (v), | f fdv| < f Ifl dlv]. 
v|(E) = sup{] fp f dv]: |f| < 1}. 
4. Ifv is a signed measure and A, p are positive measures such that v = À — p, 
then À > vt and u > v7. 





5. Ifv, vare signed measures that both omit the value +00 or — o0, then |vi+v2| < 
[vi] + |v2|. (Use Exercise 4.) 


6. Suppose v(E) = f fdu where p is a positive measure and f is an extended 
u-integrable function. Describe the Hahn decompositions of v and the positive, 
negative, and total variations of v in terms of f and p. 


7. Suppose that v is a signed measure on (X, M) and E € M. 
a. vt(E) = sup{yv(F): EE M, F C E} and v7 (E) = —inf{vp(F): F € 
M, Fic E}. 
b. |v|(E) = sup{%} |v(E;)|: n € N, E,..., En are disjoint, and UJ} E; = 


3.2 THE LEBESGUE-RADON-NIKODYM THEOREM 


Suppose that v is a signed measure and is a positive measure on (X, M). We say 
that v is absolutely continuous with respect to u and write 


V KH 


if v( E) = 0 for every E € M for which (E) = 0. Itis easily verified that v < pn iff 
[v| & piff v? < wandv~ < p (Exercise 8). Absolute continuity is in a sense the 
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antithesis of mutual singularity. More precisely, if v L u and v < p, then v = 0, for 
if E and F are disjoint sets such that E U F = X and (E) = |v|(F) = 0, then the 
fact that v < u implies that |v|(E) = 0, whence |v| = 0 and v = 0. One can extend 
the notion of absolute continuity to the case where p is a signed measure (namely, 
v X uiffv < |u|), but we shall have no need of this more general definition. 

The term “absolute continuity” is derived from real-variable theory; see §3.5. For 
finite signed measures it is equivalent to another condition that is obviously a form 
of continuity. 





3.5 Theorem. Letv be a finite signed measure and u a positive measure on (X, M). 
Then v < u iff for every € > 0 there exists 6 > 0 such that \v(E)| < € whenever 
uE) < 6. 


Proof. Sincev < up iff |v] < wand |y(E)| < |v|(E), it suffices to assume 
that v = |v] is positive. Clearly the e-6 condition implies that v < p. On the other 
hand, if the e-6 condition is not satisfied, there exists € > 0 such that for all n € N 
we can find E,, € M with p(E,) < 27” and v(E,) > €. Let Fe = UP En and 
F = (7 Fk. Then (Fk) < D 21-* so u(F) = 0; but v( Fẹ) > e for all 
k and hence, since v is finite, v( F) = lim y( Fk) > e€. Thus it is false that v << u. g 


If u is a measure and f is an extended p-integrable function, the signed measure 
v defined by v( E) = fp f dp is clearly absolutely continuous with respect to p; it is 
finite iff f € L! (u). For any complex-valued f € L(y), the preceding theorem can 
be applied to Re f and Im f, and we obtain the following useful result: 


3.6 Corollary. If f € L*(u), for every e > O there exists 6 > O such that 
| fp f du| < € whenever p(E) < ô. 


We shall use the following notation to express the relationship v(£) = f p f dp: 
dv = f du. 


Sometimes, by a slight abuse of language, we shall refer to “the signed measure 
f du.” 

We now come to the main theorem of this section, which gives a complete picture 
of the structure of a signed measure relative to a given positive measure. First, a 
technical lemma. 


3.7 Lemma. Suppose that v and p are finite measures on(X,M). Eitherv L u, or 
there exist € > 0 and E € M such that u(E) > O and v > ep on E (that is, E is a 
positive set for v — ep). 


Proof. Let X = P, UN, be a Hahn decomposition for v — n—'p, and let 
P = | JY Pa and N = NF Nn = P°. Then N is a negative set for v — n~t u for all 
n, i.e., 0 < v(N) < n™tu(N) for all n, so v(N) = 0. If u(P) = 0, then v L u. If 
u(P) > 0, then u(P„) > 0 for some n, and P, is a positive set for v — n~1 u. E 
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3.8 The Lebesgue-Radon-Nikodym Theorem. Let v be a o-finite signed measure 
and u a o-finite positive measure on (X,M). There exist unique o-finite signed 
measures À, p on (X, M) such that 


Alu, p&u, and v=X+ p. 


Moreover, there is an extended -integrable function f : X — Rsuchthatdp = f dp, 
and any two such functions are equal p1-a.e. 


Proof. Case I: Suppose that v and p are finite positive measures. Let 
= [AX — [Ooo]: f fdu < v(B) forall EEM}. 
E 


F is nonempty since 0 € F. Also, if f,g € F, then h = max(f,g) € F, for if 
A = {x : f(x) > g(x)}, for any E € M we have 


fru- f fant f gdu <vV(ENA)+vV(E\A)=v(E). 
E ENA E\A 


Let a = sup{ f f dw: f € F}, noting that a < v(X) < œ, and choose a sequence 
{fn} C F such that f f,du — a. Let gn = max(fi,.-., fn) and f = sup, fn- 
Then gn € F, gn increases pointwise to f, and f Gn du = a fn dp. It follows that 
lim f gn du = a and hence, by the monotone convergence theorem, that f € F 
and f fdp = a. (In particular, f < oo a.e., so we may take f to be real-valued 
everywhere.) 

We claim that the measure dA = dv — f du (which is positive since f € F) is 
singular with respect to u. If not, by Lemma 3.7 there exist £ € M and € > 0 
such that u(E) > 0 and A > ep on E. But then exg du < dà = dv — f dy, that 
is, (f + exe)dpu < dv, so f + cxe € F and [(f + exe)dp = a + ce(E) >a, 
contradicting the definition of a. 

Thus the existence of A, f, and dp = f dy is proved. As for uniqueness, if also 
dv = dX’ + f'dp, we have dA — dX’ = (f’ — f) du. But A — X’ L p (see Exercise 
9), while (f’ — f) du & dy; hence dà — dX = (f’ — f) du = 0, so that A = A’ and 
(by Proposition 2.23) f = f’ u-a.e. Thus we are done in the case when p and v are 
finite measures. 

Case II: Suppose that u and v are o-finite measures. Then X is a countable 
disjoint union of jz-finite sets and a countable disjoint union of v-finite sets; by taking 
intersections of these we obtain a disjoint sequence {A;} C M such that (A;) and 
v( Aj) are finite for all j and X = |J? Aj. Define u; (E) = (EN A;) and (E) = 
v(E N A;). By the reasoning above, for each j we have dv; = dA; + f; du; where 
Aj L uj. Since pj (A$) = v;(AF) = 0, we have (A$) = v; (A$) — Jas f du; = 0, 
and we may assume that fj = 0 on AS. Let A = S0y° A; and f = Soy” fj. Then 
dv = dA + f dp, A L p (see Exercise 9), and dA and f dy are o-finite, as desired. 
Uniqueness follows as before. 

The General Case: If v is a signed measure, we apply the preceding argument to 
vt and v~ and subtract the results. E 
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The decomposition v = A + p where A L u and p < pis called the Lebesgue 
decomposition of v with respect to u. In the case where v < u, Theorem 3.8 says 
that dv = f du for some f. This result is usually known as the Radon-Nikodym 
theorem, and f is called the Radon-Nikodym derivative of v with respect to u. We 
denote it by dv / dp: 


(Strictly speaking, dv /dp should be construed as the as the class of functions equal 
to f u-a.e.) The formulas suggested by the differential notation du /dv are generally 
correct. For example, it is obvious that d(vı + v2)/du = (dv /dp) + (dvz2/du), and 
we have the chain rule: 


3.9 Proposition. Suppose that v is a o-finite signed measure and u, À are o-finite 
measures on (X,M) such that v & pand u & À. 


a. Ifg € L! (v), theng(dv/du) € L! (u) and 
d 
Jow= [oF ay. 
dp 


CR A-a.e 


dd dud 


b. We have v < A, and 


Proof. By considering vt and v~ separately, we may assume that v > 0. The 
equation f gdv = f g(dv/dp) dy is true when g = xg by definition of dv /dp. It 
is therefore true for simple functions by linearity, then for nonnegative measurable 
functions by the monotone convergence theorem, and finally for functions in L! (v) 
by linearity again. Replacing v, u by u, À and setting g = xz(dv/dy), we obtain 


dv J dv du 
v(E)= | —du= | — = dà 
(£) J: du Je dud 
for all E € M, whence (dv/dA) = (dv/du)(dw/dd) A-a.e. by Proposition 2.23. m 
3.10 Corollary. [fu & Aand ÀA & p, then (dA/dp)(du/dr) = 1 ae. (with respect 


to either À or p). 


Nonexample: Let u be Lebesgue measure and v the point mass at 0 on (R, Br). 
Clearly v L u. The nonexistent Radon-Nikodym derivative dv /dpu is popularly 
known as the Dirac 6-finction. 


We conclude this section with a simple but important observation: 


3.11 Proposition. If 111,..., Hn are measures on (X, M), there is a measure u such 
that pj < pw for all j — namely, u = X`} wj. 


The proof is trivial. 
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Exercises 

8. v<piff |p| <piffvt <pandv” <p. 

9. Suppose {v;} is a sequence of positive measures. If v; L p for all j, then 
Yo) vj L wand if v; < u for all j, then XOY vj < p. 

10. Theorem 3.5 may fail when v is not finite. (Consider dv(x) = dx/x and 
du(x) = dx on (0,1), or v = counting measure and (E) = $ peg 27” on N.) 


11. Let u be a positive measure. A collection of functions {fataca C DL (py) 
is called uniformly integrable if for every € > O there exists 6 > O such that 
| Je fa du| < € for all a € A whenever p(E) < 6. 

a. Any finite subset of L! (u) is uniformly integrable. 

b. If {fn} is a sequence in L! (u) that converges in the L! metric to f € L! (u), 

then {fn } is uniformly integrable. 


12. For j = 1,2, let uj, v; be o-finite measures on (X;, M3) such that v; << pj. 
Then vı X v2 < p41 X u2 and 


dv 


d(vı x v2) dvi dvz 
duz 


x 
d(uı X p2) day 
13. Let X = [0,1], M = Biot); m = Lebesgue measure, and u = counting measure 
on M. 
a. m < u but dm +Æ f du for any f. 
b. u has no Lebesgue decomposition with respect to m. 


(z2). 


(x1, £2) = 


14. If v is an arbitrary signed measure and p is a o-finite measure on (X, M) such 
that v < p, there exists an extended u-integrable function f : X — [—o0, œo} such 
that dv = f du. Hints: 
a. It suffices to assume that p is finite and v is positive. 
b. With these assumptions, there exists & € M that is o-finite for v such that 
(E) > (F) for all sets F that are o-finite for v. 
c. The Radon-Nikodym theorem applies on E&E. If FA E = Ø, then either 
v(F) = (F) = 0 or p(F) > 0 and |v(F’)| = œ. 


15. A measure p on (X, M) is called decomposable if there is a family F C M with 
the following properties: (i) u(F’) < œ for all F € F; (ii) the members of F are 
disjoint and their union is X; (iii) if w(E’) < oo then u(E) = So peg h(E N F); (iv) 
if E c X and EN F € M forall F € F then E € M. 
a. Every o-finite measure is decomposable. 
b. If uis decomposable and v is any signed measure on (X, M) such that v < p, 
there exists a measurable f : X — [—o00, oo] such that v(E) = fp f du for any 
E that is o-finite for u, and |f| < co on any F € F that is o-finite for v. (Use 
Exercise 14 if v is not o-finite.) 


16. Suppose that u, v are measures on (X,M) with v < yp, and let A = w+ v. If 
f =dv/d), then0 < f < 1 pra. and dv/du = f/(1— f). 
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17. Let (X,M, u) be a o-finite measure space, N a sub-o-algebra of M, and v = 
uIN. If f e L*(w), there exists g € L!(v) (thus g is N-measurable) such that 
2 fdu = fo g dv forall E € N; if g’ is another such function then g = g’ v-a.e. 
(In probability theory, g is called the conditional expectation of f on N.) 


3.3 COMPLEX MEASURES 


A complex measure on a measurable space (X, M) is a map v : M — C such that 
e v() = 0; 


e if {E;} is a sequence of disjoint sets in M, then v(U > E;) = op v(E;), 
where the series converges absolutely. 


In particular, infinite values are not allowed, so a positive measure is a complex 
measure only if it is finite. Example: If u is a positive measure and f € L! (u), then 
f du is a complex measure. 

If v is a complex measure, we shall write v, and v; for the real and imaginary 
parts of v. Thus vr and v; are signed measures that do not assume the values +00; 
hence they are finite, and so the range of v is a bounded subset of C. 

The notions we have developed for signed measures generalize easily to complex 
measures. For example, we define L1(v) to be L! (v) N L! (vi), and for f € L'(v), 
we set | f dv = f f dv, +i f f dvi. If v and p are complex measures, we say that 
v l wif vg L py for a,b = r,i, and if À is a positive measure, we say that v < Aif 
Vr < A and v; < A. The theorems of 83.2 also generalize; one has merely to apply 
them to apply them to the real and imaginary parts separately. In particular: 


3.12 The Lebesgue-Radon-Nikodym Theorem. /fv is a complex measure and p 
is a o-finite positive measure on (X,M), there exist a complex measure À and an 
f € L'(u)suchthat à L panddv = dà+ f du. IfalsoX’ L wanddv = dN +f" dp, 
then X= X and f = f' p-ae. 


As before, if v < u, we denote the f in Theorem 3.12 by dv /dp. 

The total variation of a complex measure v is the positive measure |v| determined 
by the property that if dv = f du where p is a positive measure, then d|v| = |f| du. 
To see that this is well defined, we observe first that every v is of the form f dy for 
some finite measure u and some f € L! (u); indeed, we can take u = |v,| + |v;| and 
use Theorem 3.12 to obtain f. Second, if dv = fı du; = fo dus, let p = pı + po. 
Then by Proposition 3.9, 


du 
dp = dv = ae 
fi dp p= = p P, 
so that fı (duı/dp) = f2(du2/dp) p-a.e. Since a is nonnegative, we therefore 
have 
dp2| _ 


d k 
Al Hı = -s 





“| =| pt 
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and thus 
“e 


d 
fildua = [fi ~ = [fal do = | fal dua, 


Hence the definition of |v| is en of the choice of u and f. This definition 
agrees with the previous definition of v when v is a signed measure, for in that 
case dv = (xp — xn)d|v| where X = PUN is a Hahn decomposition, and 


xP —-xn| = 1. 


3.13 Proposition. Let v be a complex measure on (X,M). 

Ju(E)| < |v|(E) forall E € M. 

b. v < |v|, and dv/d\v| has absolute value 1 |v|-a.e. 

L! (v) = L}(|v|), and if f € L! (v), then | f fdv| < f \fl dv. 


ÑR 


S 


Proof. Suppose dv = f dp as in the definition of |v|. Then 


E)| = FELI < f isla = |v|(B). 


This proves (a) and shows that v < |v|. If g = 
gd|v| = g|f| du, so g|f| = f p-a.e. and hence |v|-a.e. But clearly |f| > 0 |v|-ae., 
whence |g| = 1 |v|-a.e. Part (c) is left to the reader (Exercise 18). E 





3.14 Proposition. [f vı, v2 are complex measures on (X,M), then |v, + vo| < 
[vi] + |vel. 


Proof. By Proposition 3.11 we can write v; = f; du, with the same pu, for 


j = 1,2. But then d|vy ae v| = fi + fo| du < |fi| du ae | fo| du = d\v; | + dlp]. | 


Exercises 


18. Prove Proposition 3.13c. 





and v < A iff |v| <A. 
20. If v is acomplex measure on (X, M) and v(X) = |v|(X), then v = |v]. 
21. Let v be a complex measure on (X, M). If E € M, define 


LE) = sup {> |vp(£;)|: n EN, E,..., En disjoint, E = Us}. 
1 1 


BE) = suf) |v(4;)| : £1, E2, ... disjoint, E = Us}, 
1 1 
3(E) = sup{| | fa #11} 
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Then p41 = u2 = u3 = |p|. (First show that pı < u2 < u3. To see that u3 = |v], 
let f = dv/d|v| and apply Proposition 3.13. To see that u3 < “1, approximate f by 
simple functions.) 


3.4 DIFFERENTIATION ON EUCLIDEAN SPACE 


The Radon-Nikodym theorem provides an abstract notion of the “derivative” of a 
signed or complex measure v with respect to a measure u. In this section we analyze 
more deeply the special case where (X, M) = (R”, Brn) and u = m is Lebesgue 
measure. Here one can define a pointwise derivative of v with respect to m in the 
following way. Let B(r,x) be the open ball of radius r about z in R”; then one can 


consider the limit (Bl `) 
. DBI & 
nO = So mGa) 


when it exists. (One can also replace the balls B(r,x) by other sets which, in a 
suitable sense, shrink to z in a regular way; we shall examine this point later.) If 
v X m, sothat dv = f dm, thenv(B(r, z))/m(B(r, z)) is simply the average value 
of f on B(r, x), so one would hope that F' = f m-a.e. This turns out to be the case 
provided that v(B(r,x)) is finite for all r, x. From the point of view of the function 
f, this may be regarded as a generalization of the fundamental theorem of calculus: 
The derivative of the indefinite integral of f (namely, v) is f. 

For the remainder of this section, terms such as “integrable” and “almost every- 
where” refer to Lebesgue measure unless otherwise specified. We begin our analysis 
with a technical lemma that is of interest 1n its own right. 


3.15 Lemma. Let C be a collection of open balls in R”, and let U = (J pee B. If 
c < m(U), there exist disjoint B,,..., By € © such that 5 mB I S53" 


Proof. Ifc < m(U), by Theorem 2.40 there is a compact K C U with m(K) > 
c, and finitely many of the balls in € — say, A1, ... , Am — cover K. Let Bı be the 
largest of the A;’s (that is, choose B; to have maximal radius), let B2 be the largest of 
the A;’s that are disjoint from B1, B3 the largest of the A;’s that are disjoint from Bı 
and Bg, and so on until the list of A;’s is exhausted. According to this construction, 
if A; is not one of the B,’s, there is a j such that A; N B; 4 Ø, and if 7 is the smallest 
integer with this property, the radius of A; is at most that of B;. Hence A; C Be 
where B7 is the ball concentric with B; whose radius is three times that of B;. But 


then K C Us Bs, so 


k 
c<m(K)< S > m(Bj) = 3” X m(B;). 


1 
E 


A measurable function f : R” — C is called locally integrable (with respect to 
Lebesgue measure) if fp |f (x)| dx < oo for every bounded measurable set K C R”. 
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We denote the space of locally integrable functions by Li... If f € Li, x € R”, 
and r > 0, we define A, f(x) to be the average value of f on B(r, x): 


Afla) = Ey J BEOL 


3.16 Lemma. Iff € Lj, Ar f(x) is jointly continuous in r and z (r > 0, x € R”). 


Proof. From the results in §2.7 we know that m(B(r,x)) = cr” where c = 
m(B(1,0)), and m(S(r,x)) = 0 where S(r,x) = {y : |y — x| = r}. Moreover, 
as r — ro and £ —> T0, XB(r,z) — XB(ro,zo) pointwise on R” \ S(ro, zo). Hence 


XB(r,z) 7 XB(ro,zo) &€, and |XB(r s)| £ XB(ro+1, 20) ifr < ro+4and|zr—zro]| < ż. 
By the dominated convergence theorem, it follows that f Bes f(y) dy is continuous 


in r and z, and hence so is A, f(x) = ctra” fse a f(y) dy. E 


Next, if f € Ll. we define its Hardy-Littlewood maximal function H f by 


H f(x) = sup Ar|f|(z) = sup |f (y)| dy. 


1 
>0 m(B(r, x)) i 


H f is measurable, for (H f)~'((a,00)) = U,.s9(Arl f|)~* ((a, œ0)) is open for any 
a € R, by Lemma 3.16. 


3.17 The Maximal Theorem. There is a constant C > 0 such that for all f € L! 
and all a > Q, 


C 
m({x : Hf(x) >a}) < = f \f(e)lae. 
Proof. Let Ea = {x : Hf(x) > a}. For each x € Ea we can choose rz > 0 


such that A,.|f|(z) > a. The balls B(r,,r) cover Ea, so by Lemma 3.15, if 
c < m(Ea) there T1,---,Zk E Ea such that the balls B; = B(rz;,x£;) are 


disjoint and $f m(B;) > 37”c. But then 


e< JMB, Ey fin dias = fife 


Letting c > m( Ea), we obtain the desired result. E 


)| dy. 


With this tool in hand, we now present three successively sharper versions of the 
fundamental differentiation theorem. In the proofs we shall use the notion of limit 
superior for real-valued functions of a real variable, 


lim sup ¢(r)=lim sup ¢(r)=inf sup g(r), 
r>R e>0 0<|r-R|<e €>0 Q<|r—R\ <e 


and the easily verified fact that 


lim g(r) =c iff limsup|¢(r) —c| = 0. 
rR r= R 
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3.18 Theorem. If f € Li, then lim, A, f(z) = f(x) for a.e. x € R”. 


loc? 


Proof. It suffices to show that for N € N, A,f(z) — f(x) for a.e. x with 
|z| < N. But for |z| < N andr < 1 the values A, f(x) depend only on the values 
f(y) for |y] < N +1, so by replacing f with FX B(N+1,0) We May assume that 
TED: 

Given € > 0, by Theorem 2.41 we can find a continuous integrable function g 
such that f |g(y) — f(y)| dy < €. Continuity of g implies that for every x € R” and 
ô > 0 there exists r > 0 such that |g(y) — g(x)| < 6 whenever |y — z| < r, and 
hence 


|A-g(2) — 9(2)| = <ê 


/ [a(y) — g(x)] dy 
B(r,zx) 


Therefore A,g(x) — g(x) as r — 0 for every zx, so 


m(B(r, z)) 








lim sup |A, f(x) — f(z) 
= lim sup|4y(f — 9)(2) + (Arg — 9)(2) + (9 - f)(2)| 
< H(f —9)(z) +0+|f — g|(z). 
Hence, if 


= {zr: lim sup |A,f(2) - f(x)| >a}, Fa = {x:|f —g|(z) >a}, 


we have 
Ea C Fa U ie  H(f —9)(a) > 0/2}. 
But (a/2)m(Fos2) < J Faj — g(x)| dx < €, so by the maximal theorem, 
2€ 2Ce 
m(Ea) < — + —. 
a a 


Since € is arbitrary, m(Ea) = 0 for alla > 0. But lim,.9 A, f(z) = f(x) for all 
t¢U; E1/n, so we are done. E 


This result can be rephrased as follows: If f € Li... 


1 


Actually, something stronger is true: (3.19) remains valid if one replaces the integrand 
by its absolute value. That is, let us define the Lebesgue set Lẹ of f to be 


“lim ——— - f(x)|dy = 05. 
b= fa: im HED Ja, VOIO | 
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3.20 Theorem. Iff € Lj... then m((Ly)°) = 0. 


Proof. For each c € C we can apply Theorem 3.18 to ge(x) = |f(x) — c| to 
conclude that, except on a Lebesgue null set Ło, we have 


J 1 = T) — C 
im [ ag FW) ~eldy = t-d, 


Let D be a countable dense subset of C, and let E = en E.. Then m(E) = 0, 
and if x ¢ E, for any € > 0 we can choose c € D with | f(z) — c| < €, so that 
\f(y) — f(x)| < |f(y) — c| + €, and it follows that 


lim sup f(y) — flx)| dy < | f(x) — c| +e < 2e. 


o. 
r=0 m(B(r,z)) B(r,x) 
Since € is arbitrary, the desired result follows. E 


Finally, we consider families of sets more general than balls. A family { E, },->0 
of Borel subsets of R” is said to shrink nicely to xz € R” if 


e E, C B(r,x) for each r; 
e there is a constant a > 0, independent of r, such that m(E,) > am(B(r, x)). 
The sets &, need not contain z itself. For example, if U is any Borel subset of 


B(1,0) such that m(U) > 0, and E, = {x +ry:y E€ U}, then {E,} shrinks nicely 
to x. Here, then, is the final version of the differentiation theorem. 


3.21 The Lebesgue Differentiation Theorem. Suppose f € L}. For every x in 
the Lebesgue set of f — in particular, for almost every x — we have 





1 1 
lim ay J, 1#) ~ £(2)|dy = O and lim = Í Flu) dy = F(a) 
for every family {E,},s0 that shrinks nicely to zx. 
Proof. For some a > 0 we have 
1 1 
— — d — — d 
aE J, HO- IOANA < rs Ja, POO 


1 
IOE J O = Fe) dy. 


The first equality therefore follows from Theorem 3.20, and one sees immediately 
that it implies the second one by writing the latter in the form (3.19). E 


< 
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We now return to the study of measures. A Borel measure v on R” will be called 
regular if 


e v(K) < œ for every compact K; 
e v(E) = inf{v(U) : U open, E C U} for every E € Brn. 


(Condition (ii) is actually implied by condition (i). For n = 1 this follows from 
Theorems 1.16 and 1.18, and we shall prove it for arbitrary n in §7.2. For the time 
being, we assume (11) explicitly.) We observe that by (i), every regular measure is 
o-finite. A signed or complex Borel measure v will be called regular if |v| is regular. 

For example, if f € L+(IR”), the measure f dm is regular iff f € Li... Indeed, 
the condition f € Lies is clearly equivalent to (i). If this holds, (11) may be verified 
directly as follows. Suppose that & is a bounded Borel set. Given 6 > 0, by 
Theorem 2.40 there is a bounded open U D E such that m(U) < m(E) + 6 and 
hence m(U \ E) < 6. But then, given € > 0, by Corollary 3.6 there is an open 
U > E such that Jug fam < €e and hence fy fdm < fp fdm +e. The case 


of unbounded E follows easily by writing E = J; E; where E; is bounded and 
finding an open U; D E; such that JUA E, fdm < e274, 


3.22 Theorem. Letv be a regular signed or complex Borel measure on R”, and let 
dv = dà + f dm be its Lebesgue-Radon-Nikodym representation. Then for m-almost 
every x E€ R”, 
_ V(E,) 
ll 
r>0 m(Er) 





= f(x) 
for every family { E, }r>o that shrinks nicely to zx. 


Proof. Itis easily verified that d|v| = d|A| + |f| dp, so the regularity of v implies 
the regularity of both À and f dm (Exercise 26). In particular, f € L}. so in view of 
Theorem 3.21, it suffices to show that if A is regular and A L m, then for m-almost 
every x, A(E,)/m(E;,) — 0 as r — 0 when E, shrinks nicely to x. It also suffices 


to take E, = B(r, x) and to assume that A is positive, since for some a > 0 we have 


_ AE») © ABC2)) ABC) 
— m(E,) 7 m(E) ~ am(B(r,x)) 





A(Er) 





Assuming A > 0, then, let A be a Borel set such that A( A) = m(A°Ħ) = 0, and let 


aor : lim su AED) es 
Fk =4 EA:] P B D) > =} 


We shall show that m(F;,) = 0 for all k, and this will complete the proof. 

The argument is similar to the proof of the maximal theorem. By regularity of 
A, given € > O there is an open U; D A such that A\(U.) < e€. Each x € Fy is 
the center of a ball B, C U; such that \(B,) > k~'m(B,). By Lemma 3.15, if 
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Y= User, B, and c < m(V.) there exist x1,...,2 7 such that B,,,...,B,, are 
disjoint and 


J J 
EL 3Y M(B) SkY ABa) RSE) BEN UL) 2 3 ke 
1 1 


We conclude that m(V.) < 3”ke, and since Fk C V; and € is arbitrary, m( Fk) = 0. g 


Exercises 


22. If f € Li(R"), f # 0, there exist C, R > 0 such that Hf (x) > Clx|—” for 
|z| > R. Hence m({x : H f(x) > a}) > C’/a when a is small, so the estimate in 
the maximal theorem is essentially sharp. 


23. A useful variant of the Hardy-Littlewood maximal function is 


H* f(z) = sup | ae | If0)ldv: Bis a ball mdzeB). 


Show that Hf < H*f < 2 Hf. 


24. If f € Li. and f is continuous at z, then z is in the Lebesgue set of f. 


loc 


25. If E is a Borel set in R”, the density Dg (x) of E at x is defined as 


= m(EN B(r,x)) 
DEN S 


whenever the limit exists. 
a. Show that Dg(x) = 1 for a.e. x € E and Dg(zx) = 0 for a.e. x € E°. 
b. Find examples of E and x such that Dg (zx) is a given number a € (0, 1), or 
such that Dg (x) does not exist. 


26. If A and yp are positive, mutually singular Borel measures on R” and A + p is 
regular, then so are À and wy. 


3.5 FUNCTIONS OF BOUNDED VARIATION 


The theorems of the preceding section apply in particular on the real line, where, 
because of the correspondence between regular Borel measures and increasing func- 
tions that we established in 81.5, they yield results about differentiation and inte- 
gration of functions. As in §1.5, we adopt the notation that if F’ is an increasing, 
right continuous function on R, pr is the Borel measure determined by the rela- 
tion up((a,b]) = F(b) — F(a). Also, throughout this section the term “almost 
everywhere” will always refer to Lebesgue measure. 

Our first result uses the Lebesgue differentiation theorem to prove the a.e. differ- 
entiability of increasing functions. 





FUNCTIONS OF BOUNDED VARIATION 101 


3.23 Theorem. Let F : R — R be increasing, and let G(x) = F(x+). 
a. The set of points at which F is discontinuous is countable. 
b. F and G are differentiable a.e., and F” = Œ ae. 


Proof. Since F is increasing, the intervals (F'(a—), F'(x+)) (x € R) are disjoint, 
and for |x| < N they lie in the interval (F'(—N), F(N)). Hence 


> [F(e+) - F(e-)| < FIN) - F(-N) <0, 
lIa|<N 


which implies that {x € (—N,N): F(x+) # F(x—)} is countable. As this is true 
for all N, (a) is proved. 

Next, we observe that G is increasing and right continuous, and G = F except 
perhaps where F' is discontinuous. Moreover, 


B ((x, x+ h]) if h > 0, 
G(x + h) — G(x) = acta h,z]) ifh<0, 


and the families {(a—r, z]} and {(z, x+7r]} shrink nicely to z asr = |h| — 0. Thus, 
an application of Theorem 3.22 to the measure ug (which is regular by Theorem 
1.18) shows that G” (x) exists for a.e. x. To complete the proof, it remains to show 
that if H = G — F, then H’ exists and equals zero a.e. 

Let {x;} be an enumeration of the points at which H # 0. Then H(z;) > 0, 
and as above we have Gini H(x;) < oo for any N. Let 6; be the point 
mass at x; and u = $, j H(x;)6;. Then p finite on compact sets by the preceding 
sentence, and hence yp is regular by Theorems 1.16 and 1.18; also, y L m since 
m(E) = (E°) = 0 where E = {x,;}9°. But then 


H(x+h)—H(2)| — H(z + h) + H(z) — we 2lhl, z + 2|h])) 
h 7 I 7 4|h| | 


which tends to zero as h — 0 for a.e. x, by Theorem 3.22. Thus H’ = 0 a.e., and we 
are done. | 


As positive measures on R are related to increasing functions, complex measures 
on R are related to so-called functions of bounded variation. The definition of the 
latter concept is a bit technical, so some motivation may be appropriate. Intuitively, 
if F(t) represents the position of a particle moving along the real line at time t, the 
“total variation” of F over the interval [a,b] is the total distance traveled from time 
a to time b, as shown on an odometer. If F has a continuous derivative, this is just 
the integral of the “speed,” (ie |F’’(t)| dt. To define the total variation without any 
smoothness hypotheses on F requires a different approach; namely, one partitions 
[a, b] into subintervals [t;_1, t;] and approximates F' on each subinterval by the linear 
function whose graph joins (t;_1, F'(t;-1)) to (t;, F(t;)), and then passes to a limit. 

In making this precise, we begin with a slightly different point of view, taking 
a = —oo and considering the total variation as a function of b. To wit, if F : R — C 
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and x € R, we define 
Tr(x) = supf > |E (x;)— F(zj-1)| 20 € N, -0 < To SS Tn = x}. 
1 


Tr is called the total variation function of F. We observe that the sums in the 
definition of T'r are made bigger if the additional subdivision points x; are added. 
Hence, if a < b, the definition of Tp (b) is unaffected if we assume that a is always 
one of the subdivision points. It follows that 


Tr(b) — Tr(a) 


3.24 á 
T29 = sup{ $ IF(;) - Flej) in €N, a= zo <+ < ay =b}. 
1 


Thus TF is an increasing function with values in [0, oo]. If Tp (00) = limz—o Tp (x) 
is finite, we say that F' is of bounded variation on R, and we denote the space of all 
such F by BV. 

More generally, the supremum on the right of (3.24) is called the total variation of 
F on [a,b]. It depends only on the values of F on [a,b], so we may define BV ([a,b]) 
to be the set of all functions on [a,b] whose total variation on [a, }] is finite. If 
F €e BV, the restriction of F to [a,b] is in BV ([a,b]) for all a, b; indeed, its total 
variation on [a,b] is nothing but Tr (b) — Tp (a). Conversely, if F € BV ([a, b]) and 
we set F(x) = F(a) for x < aand F(x) = F(b) for x > b, then F € BV. By this 
device the results that we shall prove for BV can also be applied to BV ({a, b]). 


3.25 Examples. 
a. If F : R — R is bounded and increasing, then F € BV (in fact, Tp (£) = 
F(x) — F(-—oo)). 
b. If F,G € BV anda,b € C, then aF + bG € BV. 


c. If F is differentiable on R and F” is bounded, then F € BV([a,b]) for 
—co < a < b < œ (by the mean value theorem). 


d. If F(x) = sin z, then F € BV ([a,b]) for —coo < a < b < œ, but F ¢ BV. 

e. If F(x) = zsin(z~!) for s # 0 and F(0) = 0, then F ¢ BV ([a,b]) for 
a<0<boa<0<)b. 

The verification of these examples is left to the reader (Exercise 27). 


3.26 Lemma. If F € BV is real-valued, then Tr + F and Tr — F are increasing. 


Proof. Ifx < yand e > 0, choose zo < --- < £n = x such that 


S_|F(z;) - F(zj-1)| = Te(z) — €. 
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)| + |F(y) — F(z)| is an approximating sum for TF (y), 


Then 7 |F(2j) — F(2j-1 
[F(y) — F(z)| + F(z), so 


and F(y) = 


Tr(y) + F(y) )> Fe) + F(xz;-1)| 


+P ) — F(z)| + [F (y) — F(x)] + F(x) 
> Tr(x) —€+ F(z). 


Since € is arbitrary, Tp(y) + F(y) > Tr(x) + F(x), as desired. E 


3.27 Theorem. 

a. F € BV if ReF € BV andImF € BV. 

b. If F : R — R, then F € BV iff F is the difference of two ounan increasing 
PEON; for F € BV these functions may be taken to be 5(Tr + F) and 

3(Tr — F). 

c. If F € BV, then F(x+) = limy\2 F(y) and F(x—) = lim, >x F(y) exist 
for all x € R, as do F(+00) = limy—+.0 F'(y). 

d. If F € BV, the set of points at which F is discontinuous is countable. 

e. If F € BV and G(x) = F(ax+), then F" and G’ exist and are equal a.e. 


Proof. (a)is obvious. For (b), the “if” implication is easy (see Examples 3.25a,b). 
To prove “only if,’ observe that by Lemma 3.26, the equation F = (Tr + F)—- 
5(Tr — F) expresses F' as the difference of two increasing functions. Also, the 
inequalities 

Tp(y) + Fy) > Tr(z)+ F(x) (y>2) 


imply that 

IF(y) — F(x)| < Tr (y) — Tr(x) < Tr (00) — Tr(—00) < o0, 
so that F’, and hence Tp + F, is bounded. Finally, (c), (d), and (e) follow from (a), 
(b), and Theorem 3.23. E 


The representation F = $(Tr + F) — 3(Tr — F) of a a valued F € BV is 
called the Jordan decomposition of F, and (Tr + F) and $(Tr — F) are called 
the positive and negative variations of F. Since co mse. 0) = $(|z| + x) and 


xT = max(—z,0) = $(|z| — z) for x € R, we have 


z(Tr + F)(z) 
= = spf QF) F(2j;-1)|* : 20 < + < an = x} + $F (—oo), 


Theorem 3.27(a,b) leads to the connection between BV and the space of complex 
Borel measures on R. To make this precise, we introduce the space N BV (N for 
“normalized”’) defined by 


NBV = {F € BV: F is right continuous and F'(—oo) = 0}. 
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We observe that if F € BV, then the function G defined by G(x) = F'(2+)—F(—oo) 
isin NBV and G’ = F” a.e. (That G € BV follows easily from Theorem 3.27(a,b): 
if F is real and F = F; — F> where F}, Fù are increasing, then G(x) = Fi (£+) — 
[Fa (x+) + F'(—oo)}, which is again the difference of two increasing functions.) 


3.28 Lemma. If F € BV, then Tpr(—oo) = 0. If F is also right continuous, then 
sois Tp. 


Proof. Ife > 0 and z € R, choose zo < --: < £n = x so that 
>_ |F (2s) — F(z;-1)| = Tr(a) — €. 
1 


From (3.24) we see that T-(x) — TF (xo) > T(x) — €, and hence Tr(y) < € for 
y < xo. Thus Tp (—c0) = 0. 

Now suppose that F' is right continuous. Given xz € R and e€ > QO, let œ = 
Tr(x+)—Tp(zx), and choose 6 > 0 so that |F (x+h) — F (x)| < «and Tp(z + h) — 
Tp(£+) < € whenever 0 < h < 6. For any such h, by (3.24) there exist zo < -+ < 
Tn = T + h such that 


> |F (#3) — F(zj-1)| > {Tr(a + h) — Tr(2)] > 3a, 


and hence 


n 


XC |F (23) — F(aj-1)| = ła —|F (21) — F(a0)| > ła — €. 
2 


Likewise, there exist x = to < -++ < tm = 2; such that X`} |F(t;) — F(t;-1)| > 


3a, and hence 


ate >Tr(xt+h) —Tr(z) 


> >| F(t) — F(tj-1)| + X |F(z;) — F(@;-1)! 
1 2 
> za — € 
Thus a < 4e, and since € is arbitrary, a = 0. E 


3.29 Theorem. If p is a complex Borel measure on R and F(x) = u((—o0, zx]), 
then F € NBV. Conversely, if F € N BV, there is a unique complex Borel measure 
pr such that F(x) = upr((—co, 2]); moreover, |ur| = UTp. 





Proof. If pis a complex measure, we have p = pt — py, +i( R — uz ) where the 
uF are finite measures. If p n= u7 ((—00, z]), then F is increasing and right 
continuous, F*(—oo) = 0, and F* (00) = u7 (R) < oo. By Theorem 3.27(a,b), 
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the function F = Ff} — Fy + i(F3 — F; )isin NBV. Conversely, by Theorem 
3.27 and Lemma 3. 28, any F € NBV can be written in Mea form with the F; = 
increasing and in N BV. Each FE gives rise to a measure ie according to Theater 
1.16, so F(x) = wr((—00, 2]) where ur = wy — py +ilu3 — u3 ). The proof that 
|r| = Lr, is outlined in Exercise 28. E 

The next obvious question is: Which functions in N BV correspond to measures 
u such that 2 L m or u & m? One answer is the following: 


3.30 Proposition. If F N BV, then a SA Moreover, up Lmif F’ =0 
a.e., and up K m if F(x) = [2 Ţ F(t 


Proof. We have merely to observe that F” (x) = lim,—0 ur (Er)/m(Er) where 
E, = (xz, x + r} or (x — r, x) and apply Theorem 3.22. (The measure up is 
automatically regular by Theorem 1.18.) E 


The condition up < m can also be expressed directly in terms of F’, as follows. 
A function F : R — C is called absolutely continuous if for every € > 0 there exists 


ô > 0 such that for any finite set of disjoint intervals (a1, b1), ..., (ay, bn), 
N N 

(3.31) N (bj -aj)<6 = X |F(bj) - F(aj)| < e. 
1 1 


More generally, F' is said to be absolutely continuous on [a,b] if this condition is 
satisfied whenever the intervals (a;,;) all lie in [a,b]. Clearly, if F is absolutely 
continuous, then F' is uniformly continuous (take N = 1 in (3.31)). On the other 
hand, if F is everywhere differentiable and F” is bounded, then F` is absolutely 
continuous, for |F (b;) — F'(a;)| < (max |F"|) (b; — aj) by the mean value theorem. 


3.32 Proposition. If F € N BV, then F is absolutely continuous iff up & m. 


Proof. Ifur< nt the absolute continuity of F follows by applying Theorem 
3.5 to the sets E = Hy (a;,b;]. To prove the converse, suppose that E is a Borel set 
such that m(E) = 0. If € and 6 are as in the definition of absolute continuity of F, 
by Theorem 1.18 we can find open sets U; D U2 D --- D E such that m(U;) < 6 
(and thus .(U;) < 6 for all j) and up (Uj) > pr(E). Each UV; is a disjoint union of 
open intervals (a£, b£), and 


j’ “j 

N N 

X ur ((a aj, bf) 2L a;)| < € 
k=1 =] 


for all N. Letting N — oo, we obtain |ur(U;)| < € and hence |ur(E)| < €. Since 
€ is arbitrary, yp (E) = 0, which shows that up & m. E 


3.33 Corollary. If f € L1(m), then the function F(x) = f” f(t) dt is in NBV 
and is absolutely continuous, and f = A a.e. o if F € N BV is absolutely 
continuous, then F' € L! (m) and F(x) = f" F'(t 


Proof. This follows immediately from Propositions 3.30 and 3.32. E 
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If we consider functions on bounded intervals, this result can be refined a bit. 
3.34 Lemma. Jf F is absolutely continuous on |a, b], then F € BV ([a,b)]). 


Proof. Let 6 be as in the definition of absolute continuity, corresponding toe = 1, 
and let N be the greatest integer less than 6&7! (b — a) + 1. Ifa = £o <---<a2y,= 
b, by inserting more subdivision points if necessary, we can collect the intervals 
(£j—1, £j) into at most N groups of consecutive intervals such that the sum of the 
lengths in each group is less than 6. The sum ) > |F (x;)— F(xj—1)| over each group 
is at most 1, and hence the total variation of F on [a, b] is at most N. E 


3.35 The Fundamental Theorem of Calculus for Lebesgue Integrals. If —oo < 
a<b<ooand F : [a,b] — C, the following are equivalent: 


a. F is absolutely continuous on |a, b). 

b. F(x)— F(a) = f. f(t) dt for some f € L*({a,b], m). 

c. F is differentiable a.e. on [a,b], F” € L'({a,b],m), and F(x) — F(a) = 
fS. F'(t)dt. 


Proof. To prove that (a) implies (c), we may assume by subtracting a constant 
from F that F(a) = 0. If we set F(x) = 0 for x < a and F(x) = F(b) for x > b, 
then F € N BV by Lemma 3.34, so (c) follows from Corollary 3.33. That (c) implies 
(b) is trivial. Finally, (b) implies (a) by setting f(t) = 0 for t ¢ [a,b] and applying 
Corollary 3.33. E 


The following decomposition of Borel measures on R” is sometimes important. 
A complex Borel measure u on R” is called discrete if there is a countable set 
{xj} C R” and complex numbers c; such that $` |cj| < oo and wp = $ cjôz,, where 
6, is the point mass at x. On the other hand, pz is called continuous if ({x}) = 0 for 
all x € R”. Any complex measure u can be written uniquely as = ua + He where 
Ha is discrete and ue is continuous. Indeed, let E = {x : u({x}) # 0}. For any 
countable subset F of E the series } ` ep 14({}) converges absolutely (to u(F’)), so 
{x € E: |u({x})| > k~+} is finite for all k, and it follows that Æ itself is countable. 
Hence pa( A) = (AN E) is discrete and (A) = (A \ E) is continuous. 

Obviously, if u is discrete, then u L m; and if y < m, then yp is continuous. 
Thus, by Theorem 3.22, any (regular) complex Borel measure on R” can be written 
uniquely as 

u = Hd + Hac + Hsc 


where uq is discrete, Hac 18 absolutely continuous with respect to m, and Hsc is a 
“singular continuous” measure, that is, Hse 1S continuous but Hse L m. 

The existence of nonzero singular continuous measures in R” is evident enough 
when n > 1; the surface measure on the unit sphere discussed in §2.7 is one example. 
Their existence when n = 1 is not quite so obvious; they correspond via Theorem 
3.29 to nonconstant functions F € N BV such that F is continuous but F” = 0 
a.e. One such function is the Cantor function constructed in §1.5 (extended to R by 
setting F(x) = 0 for x < 0 and F(x) = 1 for x > 1). More surprisingly, there exist 
strictly increasing continuous functions F` such that F” = 0 a.e.; see Exercise 40. 
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If F € NBV, it is customary to denote the integral of a function g with respect 
to the measure ur by f gdF or f g(x) dF'(x); such integrals are called Lebesgue- 
Stieltjes integrals. We conclude by presenting an integration-by-parts formula for 
Lebesgue-Stieltjes integrals; other variants of this result can be found in Exercises 
34 and 35. 


3.36 Theorem. If F and G are in N BV and at least one of them is continuous, then 
for-—w<a<b<o, 


[ Fag+ [Gar = FEW) - F(a. 
(a.t (a,b) 


Proof. F and G are linear combinations of increasing functions in NBV by 
Theorem 3.27(a,b), so a simple calculation shows that it suffices to assume F' and 
G increasing. Suppose for the sake of definiteness that G is continuous, and let 
Q = {(x,y):a <x < y <b}. We use Fubini’s theorem to compute ur x pe() 
in two ways: 


x = x t)= f — F(a 
ieee) Jas Ja, dir(a) dGla) = J (Fw) ~ Fla ace 
= J. EIG- FoC) - Gta), 


and since G(x) = G(x—), 


uexuo(@=] | acy)ar(e)= f [CO - (a) arta) 
(a,b] J [xb] (a,b] 
= COFO- Fo] - f GdF. 
(a,b] 
Subtracting these two equations, we obtain the desired result. E 
Exercises 


27. Verify the assertions in Examples 3.25. 


28. If F € NBV, let G(x) = |wr|((—co, x]). Prove that |up| = wr, by showing 
that G = TF via the following steps. 
a. From the definition of Tp, Tp < G. 
b. |ur(E)| < rece) when £F is an interval, and hence when F is a Borel set. 
c. |ur| < UTp, and hence G < Tr. (Use Exercise 21.) 


29. If F € NBV is real-valued, then p} = up and pp = uy where P and N are 
the positive and negative variations of F’. (Use Exercise 28.) 


30. Construct an increasing function on R whose set of discontinuities is Q. 
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31. Let F(x) = z? sin(x—!) and G(x) = x?*sin(z~?) for z # 0, and F(0) = 
G(0) = 0. 

a. F and G are differentiable everywhere (including z = 0). 

b. F € BV(({-1,1]), but G ¢ BV({-1,1)). 


32. If Fi, F2,..., F € NBV and F} — F pointwise, then Tp < lim inf Tp,. 
33. If F is increasing on R, then F(b) — F(a) > vb F' (t) dt. 


34. Suppose F, G € NBV and —œ0 <a < b < œ. 
a. By adapting the proof of Theorem 3.36, show that 
= Gla 
J F(x) + F(x) dG(x) +/ G(x) + G(z—) dF (x) 
[a,b] 2 [a,b] 2 
= F(b)G(b) — F(a—)G(a-). 


b. If there are no points in [a, b] where F' and G are both discontinuous, then 
J F dG + J G dF = F(b)G(b) — F(a—)G(a-). 
[a,b] [a,b] 
35. If F and G are absolutely continuous on [a, b}, then so is FG, and 


[we + GF')(x) dx = F(b)G(b) — F(a)G(a). 


36. Let G be a continuous increasing function on [a, b] and let G(a) = c, G(b) = d. 
a. If E C [c,d] is a Borel set, then m(E) = ug(G7'(£)). (First consider the 
case where F is an interval.) 

b. If f is a Borel measurable and integrable function on [c, d], then T f(y) dy = 
n f(G(x))dG(x). In particular, i f(y) dy = Vis f(G(x))G' (x) dx if G is 
absolutely continuous. 

c. The validity of (b) may fail if G is merely right continuous rather than 
continuous. 


37. Suppose F : R — C. There is a constant M such that |F (x)— F (y)| < M|x—y| 
for all x,y € R (that is, F is Lipschitz continuous) iff F is absolutely continuous 
and |F’| < M ae. 


38. If f : [a,b] — R, consider the graph of f as a subset of C, namely, {t + if (t) : 
t € [a,b] }. The length L of this graph is by definition the supremum of the lengths 
of all inscribed polygons. (An “inscribed polygon” is the union of the line segments 
joining t;_-1 + i f(tj—1) tot; + if(t;), 1 < j < n, wherea=to < -+ < tn = b.) 
a. Let F(t) = t + if (t); then L is the total variation of F' on [a, b]. 
b. If f is absolutely continuous, L = fo pI UI] dt. 


39. If {F;} is a sequence of nonnegative increasing functions on [a,b] such that 
F(x) = SOY F(z) < œ for all z € [a,b], then F’(x) = $3 F;(x) for ae. 
x € [a,b]. (It suffices to assume F} € N BV. Consider the measures pp, .) 
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40. Let F denote the Cantor function on (0, 1] (see §1.5), and set F(x) = Oforz < 0 
and F(x) = 1forz > 1. Let {[an,bn]} be an enumeration of the closed subintervals 
of [0,1] with rational endpoints, and let F,,(x) = F((x — an)/(bn — an)). Then 
G = $7 2-"F, is continuous and strictly increasing on [0,1], and G’ = 0 ae. 
(Use Exercise 39.) 


41. Let A C [0,1] be a Borel set such that 0 < m(A N I) < m(I) for every 
subinterval I of [0, 1] (Exercise 33, Chapter 1). 
a. Let F(x) = m([0,x] N A). Then F is absolutely continuous and strictly 
increasing on [0, 1], but F” = 0 on a set of positive measure. 
b. Let G(x) = m((0, £] M.A) — m([0, z] \ A). Then G is absolutely continuous 
on [0, 1], but G is not monotone on any subinterval of (0, 1]. 


42. A function F : (a,b) — R(—oo < a < b < oo) is called convex if 
F(xs + (1 —A)t) < AF(s) + (1 — A) F(t) 


for all s,t € (a,b) and A € (0,1). (Geometrically, this says that the graph of F 
over the interval from s to t lies underneath the line segment joining (s, F'(s)) to 
(t, F(t)).) 

a. F is convex iff for all s,t, s’,t’ € (a,b) such thats < s’ < t' ands <t <t, 


P(t) - F(s) . FE) - F(s') 
t-s 7 t — s! 


b. F' is convex iff F is absolutely continuous on every compact subinterval of 
(a,b) and F” is increasing (on the set where it is defined). 

c. If F is convex and to € (a,b), there exists 8 € R such that F(t) — F(to) > 
G(t — to) for all t € (a,b). 

d. (Jensen’s Inequality) If (X,M, p) is a measure space with p(X) = 1, 
g: X — (a,b) isin L! (u), and F is convex on (a, b), then 


F( fod) s [ Poodu 


(Let to = f gdu and t = g(x) in (c), and integrate.) 


3.6 NOTES AND REFERENCES 


§3.2: The Lebesgue-Radon-Nikodym theorem was proved by Lebesgue [92] in 
the case where p is Lebesgue measure on R”. Under the hypothesis v < y, it 
was generalized by Radon [111] to arbitrary regular Borel measures on IR” and by 
Nikodym [107] to measures on abstract spaces. The Lebesgue decomposition in the 
abstract setting appears in Saks [128]. The proof of the Lebesgue-Radon-Nikodym 
theorem in the text is similar to, but more efficient than, the one in Halmos [62]; I 
learned it from L. Loomis. 
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§3.3: The characterization 
|v|(E )= swf) E) we Nw Erreca a disjoint, £ = |} £; } 
1 


of the total variation of a complex measure v (see Exercise 21) is usually taken as 
the definition of |v|. Our definition seems more generally useful, and it is certainly 
easier to compute with. 


83.4: Theorems 3.21 and 3.22 are due to Lebesgue [92], but the line of argument 
we have presented is essentially that of Wiener [161], and the maximal function H f, 
in dimension one, was first studied in Hardy and Littlewood [65]. Our proof of 
Theorem 3.18 is illustrative of a general technique that has been much exploited in 
recent years, namely, controlling the limiting behavior of a family of operators by 
means of estimates on an appropriate maximal function. 

Lemma 3.15, a simplified version of Wiener’s covering lemma, is taken from 
Rudin [125]. There is also an older and more delicate covering theorem, due to 
Vitali, which is used for similar purposes: 


If E C R” and Q is a family of cubes such that each x € E is contained in 
members of Q of arbitrarily small diameter, then there is a (finite or infinite) 
disjoint sequence {Q;} C Q such that m(E \ U; Q;) = 0. 


Proofs can be found in many books, for example, Cohn [27, 86.2], Falconer [39, 
81.3], and Hewitt and Stromberg [76, §17]. 


83.5: The main results of this section are due to Lebesgue and Vitali; see Hawkins 
[70] for detailed references. Exercise 36 gives one form of the change-of-variable 
formula for Lebesgue integrals; others can be found in Serrin and Varberg [133]. 
Exercise 39 is a theorem of Fubini, and the example in Exercise 40 is due to Brown 
[21]. 

The Stieltjes integral i. g dF was originally defined, under the hypothesis that F' is 
an increasing function on [a, b}, as alimitof Riemann sums ) > g(t, )[F'(t;) —F'(t;-1)]. 
The theory of such “Riemann-Stieltjes” integrals is much like that sf the ordinary 
Riemann integral, but some care is needed to handle cases where g and F' are both 
allowed to be discontinuous. See ter Horst [148], which contains the analogue of 
Theorem 2.28 for Stieltjes integrals. 

The example of the Cantor function shows that a continuous, a.e.-differentiable 
function need not be the integral of its derivative. It is a highly nontrivial theorem 
that if F is continuous on [a, b], F’(x) exists for every x € [a,b] \ A where A is 
countable, and F” € L', then F is absolutely continuous and hence can be recovered 
from F” by integration. A proof can be found in Cohn [27, 86.3]; see also Rudin 
(125, Theorem 7.26] for the somewhat easier case when A = Ø. 

However, this is not the end of the story, for there exist everywhere differentiable 
functions F’ such that F” ¢ Lt. Perhaps the simplest example is F(x) = x? sin(x~?) 
(see Exercise 31). Here the only trouble is at x = 0, so fora < 0 < b one could 
consider T F” (t) dt as an improper integral, i.e., the limit of Lebesgue integrals over 
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[a,b] \ [—e, €] as € — 0. However, it is not hard to construct examples in which 
the singularities of F” are so complicated that F” is not Lebesgue integrable on any 
interval. In this situation the Lebesgue integral is simply insufficient. However, the 
Henstock-Kurzweil integral (or the Denjoy or Perron integral) that was discussed in 
§2.8 is powerful enough to integrate such F”, and by using this integral one obtains 
the general fundamental theorem of calculus: If F’ is everywhere differentiable on 
(a, b], then F(b) — F(a) = f? F'(t) dt. 





Point Set Topology 


The concepts of limit, convergence, and continuity are central to all of analysis, and 
it is useful to have a general framework for studying them that includes the classical 
manifestations as special cases. One such framework, which has the advantage of 
not requiring many ideas beyond those occurring in analysis on Euclidean space, 1s 
that of metric spaces. However, metric spaces are not sufficiently general to describe 
even some very classical modes of convergence, for example, pointwise convergence 
of functions on R. A more flexible theory can be built by taking the open sets, rather 
than a metric, as the primitive data, and it is this theory that we shall explore in the 
present chapter. 


4.1 TOPOLOGICAL SPACES 


Let X be a nonempty set. A topology on X is a family J of subsets of X that 
contains © and X and is closed under arbitrary unions and finite intersections (i.e., if 
{Uataea CT then Uaca Ua € T, and if U;,...,U, € T then MẸ? U; € T). The 
pair (X, J) is called a topological space. If J is understood, we shall simply refer 
to the topological space X. Let us examine a few examples: 


e If X is any nonempty set, P(X) and {@, X} are topologies on X. They 
are called the discrete topology and the trivial (or indiscrete) topology, 
respectively. 


e If X is an infinite set, {U C X : U = Ø or US is finite} is a topology on X, 


called the cofinite topology. 
113 
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e If X is a metric space, the collection of all open sets with respect to the metric 
is a topology on X. 


e If (X,T) is a topological space and Y C X, then Jy = {U QAY : U €T} is 
a topology on Y, called the relative topology induced by J. 


We now present the basic terminology concerning topological spaces. Most of 
these concepts are already familiar in the context of metric spaces. Until further 
notice, (X, J) will be a fixed topological space. 

The members of J are called open sets, and their complements are called closed 
sets. If Y C X, the open (closed) subsets of Y in the relative topology are called 
relatively open (closed). We observe that, by deMorgan’s laws, the family of closed 
sets is closed under arbitrary intersections and finite unions. 

If A C X, the union of all open sets contained in A is called the interior of A, 
and the intersection of all closed sets containing A is called the closure of A. We 
denote the interior and closure of A by A° and A, respectively. Clearly A? is the 
largest open set contained in A and A is the smallest closed set containing A, and we 
have (A°)° = A¢ and (A)° = (A°)°. The difference A \ A° = AN A° is called the 
boundary of A and is denoted by OA. If A = X, A is called dense in X. On the 
other hand, if (A)° = Ø, A is called nowhere dense. 

Ifx € X (or E cC X), a neighborhood of x (or E) is a set A C X such that 
x E€ A? (or E C A°®). Thus, a set A is open iff it is a neighborhood of itself. (Some 
authors require neighborhoods to be open sets; we do not.) A point x is called an 
accumulation point of A if A N (U \ {x}) # Ø for every neighborhood U of zx. 
(Other terms sometimes used for the same concept are “cluster point” and “limit 
point.” We shall use “cluster point” to mean something a bit different below.) 


4.1 Proposition. Jf A C X, let acc(A) be the set of accumulation points of A. Then 


A = AUacc(A), and A is closed iff acc(A) C A. 


Proof. Ifx ¢ A, then A° is a neighborhood of x that does not intersect A, so 
z ¢ acc(A); thus AUacc(A) C A. If x ¢ AUacc(A), there is an open U containing 
z such that U N A = Ø, so that A C US and z ¢ A. Thus A C AUacc(A). Finally, 
A is closed iff A = A, and this happens iff acc(A) C A. E 


If J, and Jp are topologies on X such that J; C Jo, we say that J; is weaker 
(or coarser) than J2, or that J2 is stronger (or finer) than J4. Clearly the trivial 
topology is the weakest topology on X, while the discrete topology is the strongest. If 
€ C P(X), there is a unique weakest topology J (€) on X that contains €, namely the 
intersection of all topologies on X containing £. It is called the topology generated 
by £, and € is sometimes called a subbase for J (€). 

If J is a topology on X, a neighborhood base for J at x € X is a family N C J 
such that 


e x €V forall V EN; 


e if U €J and z E€ U, there exists V € N such that x € V and V CU. 
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A base for J is a family B C J that contains a neighborhood base for J at each 
x € X. For example, if X is a metric space, the collection of open balls centered at 
x is a neighborhood base for the metric topology at x, and the collection of all open 
balls in X is a base. 


4.2 Proposition. [fT is atopology on X and E C J, then € is a base for T iff every 
nonempty U € J is a union of members of È. 


Proof. If €E isa base, U € J, and z € U, there exists V} € E€ with z € V} C U, 
so U = |J ey Vz. Conversely, if every nonempty U € J is a union of members of 
E, then {V € €: x € V} is clearly a neighborhood base at z, so £ is a base. E 


4.3 Proposition. If € C P(X), in order for E to be a base for a topology on X it is 
necessary and sufficient that the following two conditions be satisfied: 


a. each zx € X is contained in some V € E; 
b. ifU,V € Eand x € UNV, there exists W € E withx EW C (UAV). 


Proof. The necessity is clear, since if U, V are open, then so is U N V. To prove 
the sufficiency, let 


T = {U CX: for every x € U there exists V € € witha eV CU}. 


Then X € J by condition (a) and @ € J trivially, and J is obviously closed under 
unions. If U;,U2 € J and x € Ui N Uha, there exist V1, V2 € E with z € Vj CU, 
and x € V2 C Up, and by condition (b) there exists W € € witha € W C (VAV). 
Thus U; N U2 € J, so by induction J is closed under finite intersections. Therefore 
J is a topology, and € is clearly a base for J. E 


4.4 Proposition. [fE€ C P(X), the topology J (€E) generated by E consists of Ø, X, 
and all unions of finite intersections of members of €. 


Proof. The family of finite intersections of sets in €, together with X, satisfies 
the conditions of Proposition 4.3, so by Propostiion 4.2 the family of all unions of 
such sets, together with Ø, is a topology. It is obviously contained in J (€), hence 
equal to J (£E). E 


Note how the simplicity of this proposition contrasts with the corresponding result 
for o-algebras (Proposition 1.23). What makes life easier here is that only finite 
intersections are involved. 

The concept of topological space is general enough to include a great profusion 
of interesting examples, but — by the same token — too general to yield many 
interesting theorems. To build a reasonable theory one must usually restrict the class 
of spaces under consideration. The remainder of this section is devoted to a discussion 
of two types of restrictions that are commonly made, the so-called countability and 
separation axioms. 
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A topological space (X,‘J) satisfies the first axiom of countability, or is first 
countable, if there is a countable neighborhood base for J at every point of X. (It is 
useful to observe that if X is first countable, for every x € X there is a neighborhood 
base {U,;}9° at x such that U; D U;4, for all j. Indeed, if {V;}9° is any countable 
neighborhood base at x, we can take U; = (}; Vi.) The space (X,7) satisfies the 
second axiom of countability, or is second countable, if has a countable base. 
Also, (X, J) is separable if X has a countable dense subset. Every metric space is 
first countable (the balls of rational radius about x are a neighborhood base at x), and 
a metric space is second countable iff it is separable (Exercise 5). The latter fact can 
be partly generalized: 


4.5 Proposition. Every second countable space is separable. 


Proof. If X is second countable, let € be a countable base for the topology, 
and for each U € £ pick a point xy € U. Then the complement of the closure of 
{zy : U € £} is an open set that does not include any U € £; hence it is empty and 
{xy : U € €} is dense. E 


A sequence {x;} in a topological space X converges to z € X (in symbols: 
x; — T)if for every neighborhood U of x there exists J € N such that z; € U for all 
j > J. First countable spaces have the pleasant property that such things as closure 
and continuity can be characterized in terms of sequential convergence — which is 
not the case in more general spaces, as we shall see. For example: 


4.6 Proposition. If X is first countable and A C X, then xz € A iff there is a 
sequence {x; } in A that converges to x. 


Proof. Let {U;} be a countable neighborhood base at x with U; D U;j+1 for all 
j. If £ € A, then U; N A Æ Ø for all j. Pick zx; € U; N A; since Up C U; fork > j 
and every neighborhood of x contains some U}, it is clear that x; — x. On the other 
hand, if z ¢ A and {z;} is any sequence in A, then (A)° is a neighborhood of x 
containing no zj, SO £j Az. E 


Lastly, we discuss the separation axioms. These are properties of a topological 
space, labeled 7o,...,74, that guarantee the existence of open sets that separate 
points or closed sets from each other. If X has the property T}, we say that X is a T} 
space or that the topology on X is T}. 


To: If x Æ y, there is an open set containing zx but not y or an open set containing 
y but not z. 


Tı: If x Æ y, there is an open set containing y but not z. 
To: If x Æ y, there are disjoint open sets U, V with x € U andy E V. 


T3: X is a JT; space, and for any closed set A C X and any x € A‘ there are 
disjoint open sets U, V with x € U anda C V. 
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T4: X is a T; space, and for any disjoint closed sets A, B in X there are disjoint 
open sets U,V with A C U and B C V. 


To, T3, and T; also have other names: A To space is a Hausdorff space, a 73 
space is a regular space, and a T4 space is a normal space. (Some authors do not 
require regular and normal spaces to be 7;.) There is an additional useful separation 
condition, intermediate between 73 and T3, that we shall discuss in §4.2. 

The following characterization of Tı spaces is useful. It shows in particular that 
every normal space is regular and that every regular space is Hausdorff. 


4.7 Proposition. X is aT, space iff {x} is closed for every x € X. 


Proof. If X is T; and z € X, for each y ¥ = there is an open U, containing y 
but not x; thus {z}° = L,., Uy is open and {x} is closed. Conversely, if {x} is 
closed, then {x}° is an open set containing every y # =. Fi 


The vast majority of topological space that arise in practice (and, in particular, in 
this book) are Hausdorff, or become Hausdorff after simple modifications. (This last 
phrase refers to spaces such as the space of integrable functions on a measure space, 
which becomes a Hausdorff space with the L! metric when we identify two functions 
that are equal a.e.) However, two classes of (usually) non-Hausdorff topologies 
are of sufficient importance to warrant special mention: the quotient topology on a 
space of equivalence classes, discussed in Exercises 28 and 29 (84.2), and the Zariski 
topology on an algebraic variety. Without attempting to give the definition of an 
algebraic variety, we shall describe the Zariski topology on a vector space. 

Let k be a field, and let k(X1,..., Xn) be the ring of polynomials in n variables 
over x. Each P € k(.Xj,..., Xn) determines a polynomial map p : k” — k by 
substituting elements of k for the formal indeterminates X1,...,X,. The corre- 
spondence P — p is one-to-one precisely when k is infinite. The collection of all 
sets p—'({0}) in k”, as p ranges over all polynomial maps, is closed under finite 
unions, since p~'({0}) U q71({0}) = (pq)—*({0}), and it contains k” itself (take 
p = 0). Hence, by Propositions 4.2 and 4.3, the collection of all sets of the form 
Maca Pa’ ({0}) (pa being a polynomial map for each a) is the collection of closed 
sets for a topology on k”, called the Zariski topology. The Zariski topology is Tı 
by Proposition 4.7, for if a = (aj,...,@n) € k” then {a} = (); p; *({0}) where 
p;(X1,-...,Xn) = X; — aj. If k is finite the Zariski topology is discrete, but if k is 
infinite the Zariski topology is not Hausdorff; in fact, any two nonempty open sets 
have nonempty intersection. This is just a restatement of the fact that k(.X1,...,Xn) 
is an integral domain, that is, if P and Q are nonzero polynomials, then PQ is 
nonzero. (For n = 1, the Zariski topology is the cofinite topology.) 

Other examples illustrating the separation and countability axioms will be found 
in the exercises. 


Exercises 


1. If card(X) > 2, there is a topology on X that is To but not T}. 
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2. If X is an infinite set, the cofinite topology on X is T; but not J», and is first 
countable iff X is countable. 


3. Every metric space is normal. (If A, B are closed sets in the metric space (X, p), 
consider the sets of points z where p(x, A) < p(x, B) or p(x, A) > p(z, B).) 


4. Let X = R, and let J be the family of all subsets of R of the form U U (V A Q) 
where U,V are open in the usual sense. Then J is a topology that is Hausdorff but 
not regular. (In view of Exercise 3, this shows that a topology stronger than a normal 
topology need not be normal or even regular.) 


5. Every separable metric space is second countable. 


6. LetE = {(a,b]}: -w<a<b< œ}. 
a. € is a base for a topology J on R in which the members of € are both open 
and closed. 
b. J is first countable but not second countable. (If x € R, every neighborhood 
base at x contains a set whose supremum is z.) 
c. Q is dense in R with respect to J. (Thus the converse of Proposition 4.5 is 
false.) 


7. If X is a topological space, a point x € X is called a cluster point of the 
sequence {x; } if for every neighborhood U of x, x; € U for infinitely many j. If X 
is first countable, x is a cluster point of {x,} iff some subsequence of {x,; } converges 
to x. 


8. IfX isan infinite set with the cofinite topology and {zx; } is a sequence of distinct 
points in X, then z; — a for every z € X. 


9, If X isa linearly ordered set, the topology J generated by the sets {x : x < a} 
and {x : x > a} (a € X) is called the order topology. 
a. Ifa,b € X anda < b, there exist U,V € J witha € U,b € V,andz < y 
for all x € U and y € V. The order topology is the weakest topology with this 
property. 
b. If Y C X, the order topology on Y is never stronger than, but may be weaker 
than, the relative topology on Y induced by the order topology on X. 
c. The order topology on R is the usual topology. 


10. A topological space X is called disconnected if there exist nonempty open sets 
U,V such that UN V = @ and U U V = X; otherwise X is connected. When we 
speak of connected or disconnected subsets of X, we refer to the relative topology 
on them. 
a. X is connected iff and X are the only subsets of X that are both open and 
closed. 
b. If {Ea }aca is a collection of connected subsets of X such that ac 4 Ea £ 
Ø, then [J e4 Ea is connected. 


c. If A C X is connected, then A is connected. 
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d. Every point z € X is contained in a unique maximal connected subset of X, 
and this subset is closed. (It is called the connected component of zx.) 


11. If Ey,..., En are subsets of a topological space, the closure of UJ} E; is Uy Ej. 


12. Let X bea set. A Kuratowski closure operator on X is a map A |> A* from 
P(X) to itself satisfying (i) @* = ©, (ii) A C A* for all A, (iii) (A*)* = A* for all 
A, and (iv) (A U B)* = A* U B* for all A, B. 
a. If X is a topological space, the map A + A is a Kuratowski closure operator. 
(Use Exercise 11.) 
b. Conversely, given a Kuratowski closure operator, let F = {A C X : A= A*} 
and J = {U C X : U°® e F}. Then T is a topology, and for any set A C X, A* 
is its closure with respect to J. 


13. If X isatopological space, U is open in X, and Ais dense in X, then U =U N A. 


4.2 CONTINUOUS MAPS 


Topological spaces are the natural setting for the concept of continuity, which can 
be described in either global or local terms as follows. Let X and Y be topological 
spaces and f a map from X to Y. Then f is called continuous if f~!(V) is open 
in X for every open V C Y. (Since f~1(A°) = [f—*(A)]°, an equivalent condition 
is that f—1(A) is closed in X for every closed A C Y.) If x € X, f is called 
continuous at z if for every neighborhood V of f(x) there is a neighborhood U of 
x such that f(U) C V, or equivalently, if f~!(V) is a neighborhood of x for every 
neighborhood V of f(z). Clearly, if f : X — Y and g : Y — Z are continuous (or 
f is continuous at x and g is continuous at f (x)), then g o f is continuous (at x). We 
shall denote the set of continuous maps from X to Y by C(X, Y). 


4.8 Proposition. The map f : X — Y is continuous iff f is continuous at every 
rex. 


Proof. If f is continuous and V is a neighborhood of f(x), f~1(V°) is an open 
set containing x, so f is continuous at x. Conversely, suppose that f is continuous at 
each x € X. If V C Y is open, V is a neighborhood of each of its points, so f ~1(V) 
is a neighborhood of each of its points. Thus f—!(V) is open, so f is continuous. g 


4.9 Proposition. If the topology on Y is generated by a family of sets €, then 
f: X —Y is continuous iff f—1(V) is open in X for every V € £. 


Proof. This is clear from Proposition 4.4 and the fact that the set mapping f~! 
commutes with unions and intersections. E 


If f : X — Y is bijective and f and f~! are both continuous, f is called a 
homeomorphism, and X and Y are said to be homeomorphic. In this case the 
set mapping fT} is a bijection from the open sets in Y to the open sets in X, so 
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X and Y may be considered identical as far as their topological properties go. If 
f: X — Y is injective but not surjective, and f : X — f(X) is ahomeomorphism 
when f(X) c Y is given the relative topology, f is called an embedding. 

If X is any set and {fa : X — Yahaea is a family of maps from X into some 
topological spaces Yq, there is a unique weakest topology J on X that makes all the 
fa continuous; it is called the weak topology generated by {fa }aca. Namely, J is 
the topology generated by sets of the form f>1(U,.) where a € A and Ua is open in 
Kori 

The most important example of this construction is the Cartesian product of 
topological spaces. If {Xa}aca is any family of topological spaces, the product 
topology on X = [[,-4 Xa is the weak topology generated by the coordinate 
maps Ta : X — Xa. When we consider a Cartesian product of topological spaces, 
we always endow it with the product topology unless we specify otherwise. By 
Proposition 4.4, a base for the product topology is given by the sets of the form 
a eee) where n € N and Ua, is open in Xa, for 1 < j < n. These sets 
can also be written as Hasi Ua where Ua = Xa if a Æ @1,...,@n. Notice, in 
particular, that if A is infinite, a product of nonempty open sets ] [ e4 Va is open in 
llaca Xa iff Ua = Xa for all but finitely many a. 

4.10 Proposition. If Xa is Hausdorff for each a € A, then X = [| Xa is 
Hausdorff. 


acA 


Proof. If x and y are distinct points of X, we must have ma(£) Æ Taly) for 
some a. Let U and V be disjoint neighborhoods of ma (x) and ma (y) in Xa. Then 
m'(U) and 7,'(V) are disjoint neighborhoods of x and y in X. E 


Q 


4.11 Proposition. ĮfXa(œ € A)andY are topological spacesand X = [| [| c4 Xo 
then f : Y — X is continuous iff na o f is continuous for each a. 


Proof. If nao f is continuous for each a, then f—!(771(U,)) is open in Y for 
each open Ua in Xa. By Proposition 4.9, f is continuous. The converse is obvious. g 


If the spaces Xa are all equal to some fixed space X, the product | [e4 Xa is just 
the set X^ of mappings from A to X, and the product topology is just the topology 
of pointwise convergence. More precisely: 


4.12 Proposition. If X is a topological space, A is a nonempty set, and {fn} is a 
sequence in X“, then fn — f in the product topology iff fa — f pointwise. 


Proof. The sets 


k 
N(U,,...,Uk) = (aa) (U;) = {9 € X^ : gla;) € U; forl <j < k}, 
1 
where k € Nand U; isa neighborhood of f(a,;) in X for each j, forma neighborhood 
base for the product topology at f. If fa — f pointwise, then f,(a;) € U; for 
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n > Nj and hence f, € N(U1,...,Uk) for n > max(Nı,..., Ng), therefore 
fn — f in the product topology. Conversely, if fah — f in the product topology, 
a € A, and U is a neighborhood of f(a), then fa € N(U) = rg! (U) for large n; 
hence f,,(a) € U for large n, and so fala) —> f(a). E 


We shall be particularly interested in real- and complex-valued functions on topo- 
logical spaces. If X is any set, we denote by B(X, R) (resp. B(X,C)) the space 
of all bounded real- (resp. complex-) valued functions on X. If X is a topological 
space, we also have the spaces C (X, R) and C (X, C) of countinuous functions on 
X, and we define 


BC(X,F)=B(X,F)nC(X,F)  (F=RorC). 


In speaking of complex-valued functions we shall usually omit the C and simply 
write B(X), C(X), and BC(X). Since addition and multiplication are continuous 
from C x C to C, C(X) and BC (X) are complex vector spaces. 

If f € B(X), we define the uniform norm of f to be 


Ifllu = sup{|f(2)|: z € X}. 


The function p(f,g) = ||f — gllu is easily seen to be a metric on B(X), and 
convergence with respect to this metric is simply uniform convergence on X. B(X) 
is obviously complete in the uniform metric: If {f»} is uniformly Cauchy, then 
{ f,(x)} is Cauchy for each z, and if we set f(x) = lim, f,(x), it is easily verified 
that || fn — fllu — 0. 


4.13 Proposition. Jf X is a topological space, BC'(X) is a closed subspace of 
B(X) in the uniform metric; in particular, BC'(X ) is complete. 


Proof. Suppose {fn} C BC(X) and ||fn — fllu — 0. Given € > 0, choose N 
so large that || fn — fllu < €/3 for n > N. Given n > N and z € X, since fn is 
continuous at x there is a neighborhood U of x such that |fn(y) — fn(x)| < €/3 for 
y E U. But then 


f(y) — fle) O =S + lin) E + E — aE 
so f is continuous at x. By Proposition 4.8, f is continuous. E 


For a given topological space X it may happen that C (X) consists only of constant 
functions. This is obviously the case, for example, if X has the trivial topology, but 
it can happen even when X is regular. Normal spaces, however, always have plenty 
of continuous functions, as the following fundamental theorems show. 


4.14 Lemma. Suppose that A and B are disjoint closed subsets of the normal space 
X, and let A = {k27” : n > landO < k < 2”} be the set of dyadic rational 
numbers in (0,1). There is a family {U, : r € A} of open sets in X such that 
ACU, c B°forallr € AandU, C U, forr <s. 
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Proof. By normality, there exist disjoint open sets V,W such that A C V, 
B C W. Let U;/2 = V. Then since W° is closed, 


AC Uyj2 C Ui CWS CBS. 


We now select U, for r = k27” by induction on n. Suppose that we have chosen U, 
for r = k27” when 0 < k < 2” and n < N — 1. To find U, for r = (27 + 1)2-% 
(0 < j < 20-1), observe that U jo1-N and (U(;41)21-)° are disjoint closed sets 
(where we set Up = A and U i = B), so as above we can choose an open U, with 


A C U ja1-N C U C U, C U(541)21-N C BS. 


These U,.’s clearly have the desired properties. E 


4.15 Urysohn’s Lemma. Let X be a normal space. If A and B are disjoint closed 
sets in X, there exists f € C(X, [0,1]) such that f = 0 on A and f = 1 on B. 


Proof. Let U, be as in Lemma 4.14 for r € A, and set U; = X. Forz € X, 
define f(x) = inf {r : x € U,}. Since A C U, C B° for 0 < r < 1, we clearly have 
f(x) = 0 for z € Aand f(x) = 1 for x € B, and 0 < f(x) < 1 forall z € X. It 
remains to show that f is continuous. To this end, observe that f(x) < a iff x € U, 
for some r < a iff £ € Unca Ur, so f7 ((—20,a)) = Urca Ur is open. Also 
f(x) > aiff xz ¢ U, for some r > a iff x ¢ U , for some s > a (since U, C U, for 
s < r) iff £ E€ UssalUs)°, so f-*((a,00)) = Ussa (U s)° is open. Since the open 
half-lines generate the topology on R, f is continuous by Proposition 4.9. E 


The proof of Urysohn’s lemma may seem somewhat opaque at first, but there is a 
simple geometric intuition behind it. If one pictures X as the plane R? and the sets 
U, as regions bounded by curves, the curves U, form a “topographic map” of the 
function f. 


4.16 The Tietze Extension Theorem. Let X be a normal space. If A is a closed 
subset of X and f € C(A, [a,b]), there exists F € C(X, [a, b]) such that F|A = f. 


Proof. Replacing f by (f — a)/(b — a), we may assume that [a,b] = [0,1]. 
We claim that there is a sequence {gn} of continuous functions on X such that 
0 < gn < 2"71/3” on X and 0 < f — Y`} g; < (2/3)” on A. To begin with, let 
B = f~t ([0,1/3]) and C = f—1([2/3, 1]). These are closed subsets of A, and since 
A itself is closed, they are closed in X. By Urysohn’s lemma there is a continuous 
gı : X — [0,1/3] with gı = 0 on B and gı = 1/3 on C; it follows that 0 < 
f — gı < 2/3 on A. Having found 91, ... , gn—1, by the same reasoning we can find 
gn : X — [0,2”71/3”] such that gn = 0 on the set where f — D G5 2a" 
and gn = 2”71/3” on the set where f — YOT g; > (2/3)”. Let F = YF gn. 
Since ||gn||u < 2771/3”, the partial sums of this series converge uniformly, so F is 
continuous by Proposition 4.13. Moreover, on A we have 0 < f — F < (2/3)” for 
all n, whence F = f on A. E 
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4.17 Corollary. If X is normal, A C X is closed, and f € C(A), there exists 
F € C(X) such that F|A = f. 


Proof. By considering real and imaginary parts separately, it suffices to assume 
that f is real-valued. Let g = f/(1 + |f|). Then g € C(A, (—1,1)), so there exists 
G € C(X, [-1,1]) with G|A = g. Let B = G—!({-1,1}). By Urysohn’s lemma 
there exists h € C(X, [0,1]) with h = 1 on A, h = 0 on B. Then hG = Gon A 
and |hG'| < 1 everywhere, so F = hG/(1 — |hG|) does the job. E 


A topological space X is called completely regular if X is T) and for each closed 
AC X andeach z ¢ A there exists f € C(X, [0,1]) such that f(z) = 1 and f = 0 
on A. Completely regular spaces are also called Tychonoff spaces or T31 spaces. 
The latter terminology is justified, for every completely regular space is 73 (if A, x, 
f are as above, then f~1((3,00)) and f—!((—oo, 4)) are disjoint neighborhoods of 
x and A), and Urysohn’s lemma shows that every T4 space is completely regular. 


Exercises 


14. If X and Y are topological spaces, f : X — Y is continuous iff f(A) c f(A) 
for all A C X iff f-1(B) c f—'(B) forall B CY. 


15. If X is a topological space, A C X is closed, and g E€ C(A) satisfies g = 0 on 
OA, then the extension of g to X defined by g(x) = 0 for x € A° is continuous. 





16. Let X be a topological space, Y a Hausdorff space, and f,g continuous maps 
from X to Y. 

a. {x : f(x) = g(z)} is closed. 

b. If f = g on a dense subset of X, then f = g on all of X. 


17. If X is a set, F a collection of real-valued functions on X, and J the weak 
topology generated by F, then J is Hausdorff iff for every xz, y € X with z Æ y there 
exists f € F with f(x) # f(y). 


18. If X and Y are topological spaces and yo € Y, then X is homeomorphic to 
X x {yo} where the latter has the relative topology as a subset of X x Y. 


19. If {Xa} is a family of topological spaces, X = ][„ Xa (with the product 
topology) is uniquely determined up to homeomorphism by the following property: 
There exist continuous maps Ta : X — Xa such that if Y is any topological space 
and fa € C (Y, Xa) for each a, there is a unique F € C (Y, X) suchthat fa = mao F. 
(Thus X is the category-theoretic product of the X&’s in the category of topological 
spaces.) 


20. If A is a countable set and Xa is a first (resp. second) countable space for each 
a € A, then Me A Xa is first (resp. second) countable. 


21. If X isan infinite set with the cofinite topology, then every f € C'(X) is constant. 


22. Let X be a topological space, (Y, p) a complete metric space, and {fn} a 
sequence in Y~ such that SUPrex P(fn(Z), fm(x)) > 0 as m,n — oo. There is 
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a unique f € Y* such that sup,cx p(fr(z), f(x)) > O as n > œ. If each fn is 
continuous, so is f. 


23. Give an elementary proof of the Tietze extension theorem for the case X = R. 


24. A Hausdorff space X is normal iff X satisfies the conclusion of Urysohn’s lemma 
iff X satisfies the conclusion of the Tietze extension theorem. 


25. If (X, 7) is completely regular, then J is the weak topology generated by C (X). 


26. Let X and Y be topological spaces. 
a. If X is connected (see Exercise 10) and f € C(X,Y), then f(X) is con- 
nected. 
b. X is called arewise connected if for all zo,z,; € X there exists f € 
C((0, 1], X) with f(0) = xo and f(1) = xı. Every arcwise connected space is 
connected. 
c. Let X = {(s,t) € R? : t =sin(s—')} U {(0,0)}, with the relative topology 
induced from R2. Then X is connected but not arcwise connected. 


27. If Xa is connected for each a € A (see Exercise 10), then X = live A Xa 18 
connected. (Fix x € X and let Y be the connected component of x in X. Show that 
Y includes {y E€ X : ta(y) = Ta(x) for all but finitely many a} and that the latter 
set is dense in X. Use Exercises 10 and 18.) 


28. Let X be a topological space equipped with an equivalence relation, X the set 
of equivalence classes, 7 : X — X the map taking each x € X to its equivalence 
class, and J = {U c X : r~1(U) is open in X}. 

a. J is a topology on X. (It is called the quotient topology.) 

b. If Y is a topological space, f : X — Y is continuous iff f o 7 is continuous. 

ce X is T; iff every equivalence class is closed. 


29. If X is a topological space and G is a group of homeomorphisms from X to 
itself, G induces an equivalence relation on X, namely, x ~ y iff y = g(x) for some 
g € G. Let X = R?; describe the quotient space X and the quotient topology on 
it (as in Exercise 28) for each of the following groups of invertible linear maps. In 
particular, show that in (a) the quotient space is homeomorphic to [0, oo); in (b) it is 
T, but not Hausdorff; in (c) it is Zo but not 7}, and in (d) it is not Tọ. (In fact, in (d) 
X is uncountable, but there are only six open sets and there are points p € X such 
that {p} = X.) 


cos@ —sin@\ | 
ii e cos 0 ) oer} 
E aN 
0 t):aeR} 


i |) a>0,beR| 
i 4 rabe Q\ {0} | 
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4.3 NETS 


As we have hinted above, sequential convergence does not play the same central 
role in general topological spaces as it does in metric spaces. The reasons for 
this may be illustrated by the following example. Consider the space CÈ of all 
complex-valued functions on R, with the product topology (i.e., the topology of 
pointwise convergence), and its subspace C (R). On the one hand, by Corollary 2.9, 
if {fn} C C(R) and fh — f pointwise, then f is Borel measurable, so the set of 
limits of convergent sequences in C(R) is a proper subset of C®. Nonetheless, C(R) 
is dense in CÈ. Indeed, if f € CÈ, the sets 


19 € C? : |g(x;) — f(@;)| <¢ forj =1,...,n} 
(WEN, z1,...,£n E R, € > 0) 


form a neighborhood base at f, and each of these sets clearly contains continuous 
functions. 

There is, however, a generalization of the notion of sequence that works well in 
arbitrary topological spaces; the key idea is to use index sets more general than N. 
The precise definitions are as follows. 

A directed set is a set A equipped with a binary relation < such that 


e a [a forall a € A; 
e ifa < Band l < ythena Ș y; 
e for any a, € A there exists y € A such that œa < y and 8 Ș y. 


If œ S 6, we shall also write 8 Z a. A net in a set X is a mapping a |> £a froma 
directed set A into X. We shall usually denote such a mapping by (£aðaca, or just 
by (za) if A is understood, and we say that (£a) is indexed by A. 

Here are some examples of directed sets: 


i. The set of positive integers N, with j < k iff j < k. 
ii. The set R \ {a} (a € R), with < y iff |z — al > |y — al. 


iii. The set of all partitions {x,}7 of the interval [a, b] (i.e., a = £o < ++- < £n = 
b), with {x;}0 S {yk}0 iff max(x; — zj—1) > max(yx — ye-1). 


iv. The set N of all neighborhoods of a point x in a topological space X, with 
U < V iff U D V. (We say that N is directed by reverse inclusion.) 


v. The Cartesian product A x B of two directed sets, with (a, 8) < (a’, B’) iff 
a Sa’ and 8 < p’. (This is always the way we make A x B into a directed 
set.) 


Examples (i)-(iii) occur in elementary analysis: A net indexed by N is just a 
sequence, and the nets indexed by the sets in (i1) and (111) occur in defining limits of 
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real variables and Riemann integrals. Example (iv) is of fundamental importance in 
topology, and we shall see several uses of the construction in (v). 

Let X be a topological space and F a subset of X. A net (tq) ae, is eventually 
in E if there exists œo € A such that za € E for a = ao, and (za) is frequently in 
E if for every a € A there exists 6 = a such that zg € E. A point x € X is a limit 
of (xq) (or (rq) converges to x, or £a — 2) if for every neighborhood U of zx, (£a) 
is eventually in U, and z is a cluster point of (xa) if for every neighborhod U of zx, 
(£a) is frequently in U. 

The next three propositions show that nets are a good substitute for sequences. 


4.18 Proposition. If X is a topological space, E C X, and x € X, then x is an 
accumulation point of E iff there is a net in E \ {x} that converges to x, and xz E€ E 
iff there is a net in E that converges to x. 


Proof. If x is an accumulation point of E, let N be the set of neighborhoods 
of x, directed by reverse inclusion. For each U € XN, pick zy E (U \ {z})N E. 
Then zy — zx. Conversely, if zta E€ E \ {x} and £a — zx, then every punctured 
neighborhood of x contains some za, so x is an accumulation point of Æ. Likewise, 


if £a — x where za E€ E, then z € E, and the converse follows from Proposition 
4.1. E 


4.19 Proposition. Jf X and Y are topological spaces and f : X — Y, then f is 
continuous at x E€ X iff for every net (xq) converging to x, (f(£a))} converges to 


f(z). 


Proof. If f is continuous at z and V is a neighborhood of f (x), then f~1(V) isa 
neighborhood of x. Hence, if £a — z then (za) is eventually in fT! (V), so (f (£a)) 
is eventually in V, and thus f (£a) — f(x). On the other hand, if f is not continuous 
at x, there is a neighborhood V of f (x) such that f—1(V) is not a neighborhood of zx, 
that is, x ¢ (f—1(V))°, or equivalently, x € f-1(V°). By Proposition 4.18, there is 
anet (£a) in f~'(V°) that converges to x. But then f(r.) £ V, so f (£a) A f(z). 


A subnet of a net (ta) aca is a net (yg)gep together with a map 8 +> ag from 
B to A such that: 


e for every ag E A there exists Gg € B such that ag = œo whenever 6 = po; 


® yg = Tag: 


Clearly if (£a) converges to a point x, then so does any subnet (£as). 

Warning: The name “subnet” is used because subnets perform much the same 
functions as subsequences, but it should not be taken too literally, as the mapping 
B ++ ag need not be injective. In particular, the index set B may well have larger 
cardinality than the index set A, and a subnet of a sequence need not be a subsequence. 


4.20 Proposition. If (£a)aca is a net in a topological space X, then x € X isa 
cluster point of (£a) iff (£a) has a subnet that converges to zx. 
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Proof. If (yg) = (Zag) is a subnet converging to x and U is a neighborhood 
of x, choose 3; € B such that yg € U for B = 6). Also, given a € A, choose 
B2 E B such that ag 2 a for B 2 62. Then there exists 6 € B with 8 > GB, and 
Ê Z b2, and we have ag = a and x, = yg E U. Thus (£a) is frequently in U, 
so x is a cluster point of (z,,). Conversely, if x is a cluster point of (rq), let N be 
the set of neighborhoods of x and make N x A into a directed set by declaring that 
(U,a) < (U’,a’) iff U > U’ and a < a’. For each (U, y) E€ N x A we can choose 
Q(U, y) E A such that aquy) Z Y and Zaw, p E U. Then if (U',7’) Z (U, 7) we 
have au’ y) Z Y Z yand Zay „p, E U’ C U, whence it follows that (Cay) is 
a subnet of (za) that converges to z. E 


Exercises 


30. If A is a directed set, a subset B of A is called cofinal in A if for each a € A 
there exists 8 € B such that 8 > a. 
a. If B is cofinal in A and (rq) aca is a net, the inclusion map B — A makes 
(28) gep a subnet of (£a)acA. 
b. If (ta)aeA is a net in a topological space, then (z.) converges to x iff for 
every cofinal B C A there is acofinal C C B such that (x,).ec converges to z. 


31. Let (£n)nen be a Sequence. 
a. If k — nx is a map from N to itself, then (£n, ken is a subnet of (xn) iff 
nk — œ as k — ov, and it is a subsequence (as defined in 80.1) iff ny is strictly 
increasing in k. 
b. There is a natural one-to-one correspondence between the subsequences of 
(£n) and the subnets of (z,,) defined by cofinal sets as in Exercise 30. 


32. A topological space X is Hausdorff iff every net in X converges to at most 
one point. (If X is not Hausdorff, let x and y be distinct points with no disjoint 
neighborhoods, and consider the directed set Nz x Ny where Nz, N, are the families 
of neighborhoods of x, y.) 


33. Let (£a)aca be a net in a topological space, and for eacha € Alet Fy = {rg : 
b 2 a}. Then z is a cluster point of (£a) iff £ E€ aca Ea- 


34. If X has the weak topology generated by a family F of functions, then (£a) 
converges to x E X iff (f (£a)) converges to f(x) for all f € F. (In particular, if 
X = [| [ier Xi, then £a — z iff mi(£a) — mi(x) for all i € Z.) 


35. Let X bea set and A the collection of all finite subsets of X, directed by inclusion. 
Let f : X — R be an arbitrary function, and for A € A, let z4 = Doe, f(z). 
Then the net (24) converges in R iff {x : f(x) # 0} is a countable set {£n }nen and 
SOT |f(£n)| < œ, in which case z4 > X07 f(£n). (Cf. Proposition 0.20.) 


36. Let X be the set of Lebesgue measurable complex-valued functions on [0, 1]. 
There is no topology J on X such that a sequence (f,,) converges to f with respect 
to J iff fn — f a.e. (Use Corollary 2.32 and Exercises 30b and 31b.) 





128 POINT SET TOPOLOGY 


4.4 COMPACT SPACES 


In 80.6 we gave three equivalent characterizations of compactness for metric spaces: 
the Heine-Borel property, the Bolzano-Weierstrass property, and completeness plus 
total boundedness. Only the first two of these make sense for general topological 
Spaces, and it is the first one that turns out to be the most useful. Accordingly, we 
define a topological space X to be compact if whenever {Ua }aca is an open cover 
of X — that is, a collection of open sets such that X = (Jaca Ua — there is a 
finite subset B of A such that X = |J eg Ua. To be brief (although somewhat 
sylleptic, since the adjectives “open” and “finite” refer to different things), we say: 
X is compact if every open cover of X has a finite subcover. 

A subset Y of a topological space X is called compact if it is compact in the 
relative topology; thus Y C X is compact iff whenever {Ua }aca is a ae of 
open subsets of X with Y C U4 Ua, there isa finite B C AwithY CU 
Furthermore, Y is called precompact if its closure is compact. 

DeMorgan’s laws lead to the following characterization of compactness in terms of 
closed sets. A family { Fa }aea of subsets of X is said to have the finite intersection 
property if aeg Fa # Ø for all finite B C A. 


ae BU 


4.21 Proposition. A topological space X is compact iff for every family {Fa saca 


of closed sets with the finite intersection property, (aca Fa # 2. 


Proof. Let Ua = (Fa)°. Then Ua is open, aca Fo Z Z iff Uaca Ua FX, 
and {Fa} has the finite intersection property iff no finite subfamily of {Ua } covers 
X. The result follows. E 


We now list several basic facts about compact spaces. 
4.22 Proposition. A closed subset of a compact space is compact. 


Proof. If X is compact, F C X is closed, and {Ua }ac a is a family of open sets 
in X with F C Une, Ua, then {Ua }aca U {F°} is an open cover of X. It has a 
finite subcover, so by discarding F° from the latter if necessary, we obtain a finite 
subcollection of {Ua }aca that covers F. E 


4.23 Proposition. If F is a compact subset of a Hausdorff space X and x ¢ F, 
there are disjoint open sets U,V such that x E€ U and F C V. 


Proof. For each y € F, choose disjoint open U, and V, with x € U, andy € Vy. 
{Vy }yer is an open cover of F, so it has a finite subcover {V,, }7. Then U = (Ñ Uy, 
and V = |J] Vy, have the desired properties. E 


4.24 Proposition. Every compact subset of a Hausdorff space is closed. 


Proof. According to Proposition 4.23, if F is compact then F° is a neighborhood 
of each of its points, hence is open. E 
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We remark that in a non-Hausdorff space, compact sets need not be closed (for 
example, every subset of a space with the trivial topology is compact), and the 
intersection of compact sets need not be compact; see Exercise 37. Of course, in 
a Hausdorff space the intersection of any family of compact sets is compact by 
Propositions 4.22 and 4.24. Moreover, in an arbitrary topological space a finite union 
of compact sets is always compact. (If K1, ... Kn are compact and {Ua } is an open 
cover of |); Kj, choose a finite subcover of each K; and combine them.) 


4.25 Proposition. Every compact Hausdorff space is normal. 


Proof. Suppose that X is compact Hausdorff and E, F are disjoint closed subsets 
of X. By Proposition 4.23, for each x € E there exist disjoint open sets Uz, Vz with 
xz € Uz, F C Vz. By Proposition 4.22, E is compact, and {U} }ze p is an open cover 
of E, so there is a finite subcover {Uz,}?. Let U = U] Uz, and V = (); Vz,. Then 
U and V are disjoint open sets with E C U and F C V. E 


4.26 Proposition. If X is compact and f : X — Y is continuous, then f(X) is 
compact. 


Proof. Let {Va} be an open cover of f(X) in Y. Then {f—!(V,)} is an open 
cover of X, so it has a finite subcover {f 7+ (Va, )}, and {Va, } is then a finite subcover 


of f(X). E 


4.27 Corollary. If X is compact, then C(X) = BC(X). 


4.28 Proposition. If X iscompactand Y is Hausdorff, then any continuous bijection 
f:X — Y is a homeomorphism. 


Proof. If E C X is closed, then E is compact, hence f(E) is compact, hence 
f(E) is closed, by Propositions 4.22, 4.26, and 4.24. This means that f7} is 
continuous, so f is a homeomorphism. F 


We now show that a version of the Bolzano-Weierstrass property holds for compact 
topological spaces. As one might suspect, it is merely necessary to replace sequences 
by nets. 


4.29 Theorem. If X is a topological space, the following are equivalent: 
a. X is compact. 
b. Every net in X has a cluster point. 
c. Every net in X has a convergent subnet. 


Proof. The equivalence of (b) and (c) follows from Proposition 4.20. If X is 
compact and (£a) is a netin X, let Ea = {xg : B Z a}. Since for any a, 8 € A 
there exists y € A with y = a and y => ß, the family {Ea}aca has the finite 
intersection property, so by Proposition 4.21, ae Pear #0. FrEe fhe oe 
and U is a neighborhood of x, then U intersects each Eq, which means that (rq) 
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is frequently in U, so x is a cluster point of (xq). On the other hand, if X is not 
compact, let {Ug}seB be an open cover of X with no finite subcover. Let A be the 
collection of finite subsets of B, directed by inclusion, and for each A € A let z4 
be a point in (Usea Ug)°. Then (x4) aca is a net with no cluster point. Indeed, if 
x € X, choose 8 € B with x € Ug. If A € Aand A > {8} then x4 ¢ Ug, so z is 
not a cluster point of (x 4). E 


We conclude by mentioning two other useful concepts related to compactness. A 
topological space X is called countably compact if every countable open cover of 
X has a finite subcover, and sequentially compact if every sequence in X has a 
convergent subsequence. Of course, every compact space is countably compact, and 
for metric spaces compactness and sequential compactness are equivalent. However, 
in general there is no relation between compactness and sequential compactness. See 
Exercises 39—43 for further results and examples. 


Exercises 

37. Let 0’ denote a point that is is not an element of (—1, 1), and let X = (—1, 1) U 
{0}. Let J be the topology on X generated by the sets (—1,a), (a, 1), [(—1, b) \ 
{0} U {0}, and [(c, 1) {OHU {0} where —1 <a < 1,0 < b < 1,and—1 < c < 0. 


(One should picture X as (—1, 1) with the point 0 split in two.) 
a. Define f,g : (—1,1) —> X by f(x) = z for all z, g(x) = xz for x Æ 0, and 
g(0) = 0’. Then f and g are homeomorphisms onto their ranges. 
b. X is T; but not Hausdorff, although each point of X has a neighborhood that 
is homeomorphic to (—1, 1) (and hence is Hausdorff). 
c. The sets [- 4, 5] and ((—5, 5] \ {0}) U {0’} are compact but not closed in X, 
and their intersection is not compact. 
38. Suppose that (X, J) is a compact Hausdorff space and J’ is another topology 
on X. If J’ is strictly stronger than J, then (X, J’) is Hausdorff but not compact. If 
J’ is strictly weaker than J, then (X, J’) is compact but not Hausdorff. 


39. Every sequentially compact space is countably compact. 


40. If X is countably compact, then every sequence in X has a cluster point. If X 
is also first countable, then X is sequentially compact. 


41. A Ti space X is countably compact iff every infinite subset of X has an accu- 
mulation point. 


42. The set of countable ordinals (80.4) with the order topology (Exercise 9) is 
sequentially compact and first countable but not compact. (To prove sequential 
compactness, use Proposition 0.19.) 


43. For x € [0,1), let $7 an(xz)2~" (an(x) = 0 or 1) be the base-2 decimal 
expansion of x. (If x is a dyadic rational, choose the expansion such that a„ (x) = 0 
for n large.) Then the sequence (an) in {0,1}!°) has no pointwise convergent 
subsequence. (Hence {0, 1102). with the product topology arising from the discrete 
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topology on {0, 1}, is not sequentially compact. It is, however, compact, as we shall 
show in §4.6.) 


44. If X is countably compact and f : X — Y is continuous, then f (X) is countably 
compact. 


45. If X is normal, then X is countably compact iff C(X) = BC(X). (Use 
Exercises 40 and 44. If (z,,) is a sequence in X with no cluster point, then {z, : n € 
N} is closed, and Corollary 4.17 applies.) 


4.5 LOCALLY COMPACT HAUSDORFF SPACES 


A topological space is called locally compact if every point has a compact neighbor- 
hood. We shall be mainly concerned with locally compact Hausdorff spaces, which 
we call LCH spaces for short. 


4.30 Proposition. If X is an LCH space, U C X is open, and x € U, there isa 
compact neighborhood N of x such that N C U. 


Proof. We may assume U is compact; otherwise, replace U by U N F° where 
F is a compact neighborhood of x. By Proposition 4.23, there are disjoint relatively 
open sets V, W in U with z € V and OU C W. Then V is open in X since V C U, 
and V is a closed and hence compact subset of U \ W. Thus we may take N = V. g 


4.31 Proposition. If X isan LCH space and K C U C X where K is compact and 
U is open, there exists a precompact open V such tht KCV CV CU. 


Proof. By Proposition 4.30, for each x € K we can choose a compact neigh- 
borhood N, of x with N, C U. Then {N?}zex is an open cover of K, so there 
is a finite subcover {N? }7. Let V = Ur Ng; then K C V and Va Asi 
compact and contained in U. E 


4.32 Urysohn’s Lemma, Locally Compact Version. Jf X is an LCH space and 
K CU C X where K is compact and U is open, there exists f € C(X, [0,1]) such 
that f = 1 on K and f = 0 outside a compact subset of U. 


Proof. Let V be as in Proposition 4.31. Then V is normal by Proposition 4.25, 
so by Urysohn’s lemma 4.15 there exists f € C(V, [0,1]) such that f = 1 on K 
and f = 0 on OV. We extend f to X by setting f = 0 on V°. Suppose that 
E c [0,1] is closed. If 0 ¢ E we have f-1(E£) = (f|V)~*(£), and if 0 € E we 
have f—!(E) = (FIV) (E) UV = (FIV) I(E) U V“ since (f|V)—1(E) > OV. 
In either case, f 71 (E) is closed, so f is continuous. E 


4.33 Corollary. Every LCH space is completely regular. 
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4.34 Tietze Extension Theorem, Locally Compact Version. Suppose that X is an 
LCH space and K C X is compact. If f € C(K), there exists F € C(X) such that 
F|K = f. Moreover, F may be taken to vanish outside a compact set. 


The proof is similar to that of Theorem 4.32; details are left to the reader (Exercise 
46). 

The preceding results show that LCH spaces have a rich supply of continuous 
functions that vanish outside compact sets. Let us introduce some terminology: If X 
is a topological space and f € C(X), the support of f, denoted by supp( f), is the 
smallest closed set outside of which f vanishes, that is, the closure of {x : f(z) 4 0}. 
If supp( f) is compact, we say that f is compactly supported, and we define 


C.(X) = {f € C(X) : supp(f) is compact}. 


Moreover, if f € C(X), we say that f vanishes at infinity if for every € > 0 the set 
{x :|f(x)| > e} is compact, and we define 


Co(X) = {f € C(X) : f vanishes at infinity }. 


Clearly Ce( X) C Co( X). Moreover, Co(X) C BC(X), because for f € Co(X) 
the image of the set {x : |f (x)| > €} is compact, and |f| < € on its complement. 


4.35 Proposition. If X is an LCH space, Co(X) is the closure of C.(X) in the 
uniform metric. 


Proof. If {fn} is a sequence in C,(X ) that converges uniformly to f € C(X), 
for each € > O there exists n € N such that ||f, — fllu < €. Then |f(z)| < € 
if x ¢ supp(fn), so f € Co(X). Conversely, if f € Co(X), for n € N let 
Kn = {x : |f(x)| > n7t}. Then Kn is compact, so by Theorem 4.32 there exists 
gn E C.(X) withO < gn < land gn = 1 on Kn. Let fn = gn f. Then fn E C.(X) 
and || fn — fllu < n7}, so fn — f uniformly. E 





If X is a noncompact LCH space, it is possible to make X into a compact space 
by adding a single point “at infinity” in such a way that the functions in Co(X ) are 
precisely those continuous functions f such that f(x) — 0 as x approaches the point 
at infinity. More precisely, let oo denote a point that is not an element of X, let 
X* = X U {oo}, and let J be the collection of all subsets of X* such that either (i) 
U is an open subset of X, or (ii) oo € U and U* is a compact subset of X. 


4.36 Proposition. If X, X*, and J are as above, then (X*,T) is a compact Haus- 
dorff space, and the inclusion map i: X — X* is an embedding. Moreover, if 
f € C(X), then f extends continuously to X* iff f = g + c where g € Co(X) and 
cis a constant, in which case the continuous extension is given by f (co) = c. 


The proof is straightforward and is left to the reader (Exercise 47). The space X * 
is called the one-point compactification or Alexandroff compactification of X. 

If X is a topological space, the space C* of all complex-valued functions on X 
can be topologized in various ways. One way, of course, is the product topology, 
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that is, the topology of pointwise convergence. Another is the topology of uniform 
convergence, which is generated by the sets 


{gE C* : sup |g(z) - f(z)|<n"}  (nEN, fe Cc), 
rex 


The proof of Proposition 4.13 shows that C(X) is a closed subspace of C* in the 
topology of uniform convergence. Intermediate between these two topologies is the 
topology of uniform convergence on compact sets, which is generated by the sets 


{9 E C* : sup |g9(x) — f(z)| < a (n eN, f€ C*, K C X compact). 
rEK 


We shall now examine this topology in the case where X is an LCH space. 


4.37 Lemma. If X is an LCH space and E C X, then E is closed iff EN K is 
closed for every compact K C X. 


Proof. If E is closed, then E N K is closed by Propositions 4.22 and 4.24. If E 
is not closed, pick x € F \ E and let K be a compact neighborhood of x. Then z is 
an accumulation point of E N K but is not in E N K, so by Proposition 4.1 E N K 
is not closed. E 


4.38 Proposition. If X is an LCH space, C (X) is a closed subspace of C* in the 
topology of uniform convergence on compact sets. 


Proof. If f is in the closure of C (X), then f is a uniform limit of continuous 
functions on each compact K C X, so f|K is continuous. If Æ C C is closed, 
f-1(E) AN K = (f|K)~1(£) is thus closed for each compact K, so by Lemma 4.37 
f—1(E) is closed, whence f is continuous. E 


A topological space X is called o-compact if it is a countable union of compact 
sets. To appreciate the significance of the next two propositions, see Exercise 54. 


4.39 Proposition. If X is a o-compact LCH space, there is a sequence {Un} of 
precompact open sets such that Un C Un4i forall n and X = (Uy Un. 


Proof. Suppose X = (J? Kn where each Kn is compact. Every compact subset 
of X has a precompact open neighborhood by Proposition 4.31. Thus we may take 
U, to be a precompact open neighborhood of K,, and then, proceeding inductively, 
take U,, to be a precompact open neighborhood of Un—1 U Kn. E 


4.40 Proposition. If X is a c-compact LCH space and {U,,} is as in Proposition 
4.39, then for each f € C%* the sets 


{9 €C*: sup lg(a) -f(| < m) (m nEN) 
tEUn 
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form a neighborhood base for f in the topology of uniform convergence on compact 
sets. Hence this topology is first countable, and fj — f uniformly on compact sets 
iff f, — f uniformly on each Un. 


Proof. These assertions follow easily from the observation that if K C X is 
compact, then {U,,}§° is an open cover of K and hence K C U, for some n. Details 
are left to the reader (Exercise 48). E 


We close this section with a construction that is useful in a number of situations. 
If X is a topological space and & C X, a partition of unity on Æ is a collection 
{ha }aca of functions in C(X, [0, 1]) such that 


e each x € X has a neighborhood on which only finitely many ha ’s are nonzero; 
© sealat) = Lfor € E. 


A partition of unity {ha } is subordinate to an open cover U of E if for each a there 
exists U € U with supp(h,) C U. 


4.41 Proposition. Let X be an LCH space, K a compact subset of X, and {U;}? an 
open cover of K. There is a partition of unity on K subordinate to {U;}? consisting 
of compactly supported functions. 


Proof. By Proposition 4.30, each x € K has a compact neighborhood Nz 
such that N, C U; for some j. Since {N2} is an open cover of K, there exist 
T1,...,Zm such that K C UY Nz,. Let Fj be the union of those Nz,’s that 
are subsets of U;. Then F} is a compact subset of Uj, so by Urysohn’s lemma 
there exist g1,..., 9n E Ce(X, [0,1]) with gi = 1 on Fj and supp(g;) C U;. 
Since the F;’s cover K we have X`} gx > 1 on K, so by Urysohn again there 
exists f E€ C.(X, [0,1]) with f = 1 on K and supp(f) C {x : X] g(x) > 0}. 
Let gn41 = 1 — f, so that eae: > 0 everywhere, and for j = 1,...,n let 
hj = 93/77" gx. Then supp(h;) = supp(g;) C U;and) >; hj; =lonKk. g 


A generalization of this result may be found in Exercise 57. 


Exercises 
46. Prove Theorem 4.34. 


47. Prove Proposition 4.36. Also, show that if X is Hausdorff but not locally 
compact, Proposition 4.36 remains valid except that X* is not Hausdorff. 


48. Complete the proof of Proposition 4.40. 


49. Let X be a compact Hausdorff space and E C X. 
a. If E is open, then F is locally compact in the relative topology. 
b. If & is dense in X and locally compact in the relative topology, then E is 
open. (Use Exercise 13.) 
c. E is locally compact in the relative topology iff E is relatively open in F. 
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50. Let U be an open subset of a compact Hausdorff space X and U* its one-point 
compactification (see Exercise 49a). If 6 : X — U* is defined by (x) = x if x € U 
and $(x) = oo if x € U*, then ¢ is continuous. 


51. If X and Y are topological spaces, ¢ € C(X,Y) is called proper if ¢™} (K) is 
compact in X for every compact K C Y. Suppose that X and Y are LCH spaces and 
X* and Y* are their one-point compactifications. If 6 € C(X,Y), then ¢ is proper 
iff ọ extends continuously to a map from X* to Y* by setting (00x) = ooy. 


52. The one-point compactification of R” is homeomorphic to the n-sphere {x € 
Reet Sel eb. 


53. Lemma 4.37 remains true if the assumption that X is locally compact is replaced 
by the assumption that X is first countable. 


54. Let Q have the relative topology induced from R. 
a. Qis not locally compact. 
b. Q is o-compact (it is a countable union of singleton sets), but uniform con- 
vergence on singletons (i.e., pointwise convergence) does not imply uniform 
convergence on compact subsets of Q. 


55. Every open set in a second countable LCH space is o-compact. 


56. Define ® : [0, co] — [0, 1] by S(t) = t/(t + 1) for t € [0, 00) and ®(00) = 1. 
a. ® is strictly increasing and ®(¢ + s) < P(t) + (s). 
b. If (Y, p) is a metric space, then ® o p is a bounded metric on Y that defines 
the same topology as p. 
c. If X is a topological space, the function p(f,g) = ®(sup, cx | f(x) — g(z)|) 
is a metric on C* whose associated topology is the topology of uniform conver- 
gence. 
d. If X is a o-compact LCH space and {U,,}§° is as in Proposition 4.39, the 
function 


re ya sup |/(@) ~ o(2)|] 


zeU, 


is a metric on C* whose associated topology is the topology of uniform conver- 
gence on compact sets. 


57. An open cover U of a topological space X is called locally finite if each x € X 
has a neighborhood that intersects only finitely many members of U. If U,V are 
open covers of X, V is a refinement of U if for each V € V there exists U € U 
with V C U. X is called paracompact if every open cover of X has a locally finite 
refinement. 
a. If X is ao-compact LCH space, then X is paracompact. In fact, every open 
cover U has locally finite refinements {Va}, {Wa} such that Va is compact 
and Wa C Va for all a. (Let {U,,}% be as in Proposition 4.39. For each n, 
{EN (Un+t2 \Un-1) : E C U} is an open cover of U4 \ Un. Choose a finite 
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subcover to obtain {Va} and mimic the beginning of the proof of Proposition 
4.41 to obtain {Wa }.) 

b. If X is ao-compact LCH space, for any open cover U of X there is a partition 
of unity on X subordinate to U and consisting of compactly supported functions. 


4.6 TWO COMPACTNESS THEOREMS 


The geometric objects on which one does analysis (Euclidean spaces, manifolds, 
etc.) tend to be compact or locally compact. However, in infinite-dimensional spaces 
such as spaces of functions, compactness is a rather rare phenomenon and is to be 
greatly prized when it is available. Almost all compactness results in such situations 
are obtained via two basic theorems, Tychonoff’s theorem and the Arzela-Ascoli 
theorem, which we present in this section. 

Tychonoff’s theorem has to do with compactness of Cartesian products. To prepare 
for it, we introduce some notation. Recall that an element z of X = ] [„ aig eer IS; 
strictly speaking, a mapping from A into [J e4 Xa; namely, x(a) € Xa is the ath 
coordinate of x, which we generally denote by 7, (x). If B C A, there is a natural 
map 7g : X —> [ [aeg Xa; namely, mpg (zx) is the restriction of the map z to B. (In 
particular, mfa} is essentially identical to ma, and we shall not distinguish between 
them.) If p € [[,e— Xa and q € I [aec Xa, we shall say that q is an extension of p 
if q extends p as a mapping, that is, if B C C and p(a) = qla) fora € B. 


4.42 Tychonoff’s Theorem. If {Xa}aca is any family of compact topological 
spaces, then X = |] acd Xa (with the product topology) is compact. 


Proof. By Theorem 4.29, it is enough to show that any net (z;);e7 in X has a 
cluster point. We shall do this by examining cluster points of the nets (7p(2;)) in 
the subproducts of X. To wit, let 


= U fp € I] Xa : pis a cluster point of (mp (2:)) b. 
BCA aEB 


P is nonempty, because each Xa is compact and so (7 pg (x;)} has cluster points when 
B = {a}. Moreover, P is partially ordered by extension; that is, p < q if q is an 
extension of p as defined above. 

Suppose that {p; : l € L} is aneary ordered subset of P, where p; € [| [acB, X 
Let B* = (iez Bi, and let p* be the unique element of | [ eg- Xa that Sead: 
every pı. We claim that p* € P. Indeed, from the definition of the product topology, 
any neighborhood of p* contains a set of the form ] [ e 5. Ua where each Ua is open 
in Xa and Ua = Xa for all but finitely many a, say aj,...,a@,. Each of these a,;’s 
belongs to some B;, so by linearity of the ordering they all belong to a single Bı. But 
then |],¢g, Ua is a neighborhood of pı, so (7p, (z:)) is frequently in [ „eg, Ua 
hence ase i)) is frequently in ] J eg- Ua, so p* is a cluster point of (7g. (x yy 
Therefore p* is an upper bound for {p;} in P. 
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By Zorn’s lemma, then, P has a maximal element p € Lees Xa. We claim 
that B = A. If not, pick y € A \ B. By Proposition 4.20 there is a subnet 
{1(Lic3))) jer Of (m5(x:)} that converges to p, and since X is compact, there is 
a subnet (775 (2i(;(&))))kew Of (7-(24(;))) that converges to some p, E€ X}. Let q 
be the unique element of ] [ egu Gi Xa that extends both p and p,; then the net 
TBE} (©5(5(k)))) keK converges to q and hence q is a cluster point of (TR {4} (Li) 
contradicting the maximality of p. Therefore p is a cluster point of (x;), and we are 
done. E 


We now turn to the Arzelà-Ascoli theorem, which has to do with compactness in 
spaces of continuous mappings. There are several variants of this result; the theorems 
below are two of the most useful ones. See also Exercise 61. 

If X is a topological space and F C C(X), F is called equicontinuous at x € X 
if for every € > 0 there is a neighborhood U of x such that |f (y) — f(x)| < e for all 
y € U and all f € GF, and F is called equicontinuous if it is equicontinuous at each 
x € X. Also, ẸF is said to be pointwise bounded if { f(x) : f € F} is a bounded 
subset of C foreach z € X. 


4.43 Arzela-Ascoli TheoremI. Let X be a compact Hausdorff space. If F is an 
equicontinuous, pointwise bounded subset of C(X), then F is totally bounded in the 
uniform metric, and the closure of F in C(X ) is compact. 


Proof. Suppose e > 0. Since F is equicontinuous, for each x € X there is an 
open neighborhood U, of x such that |f(y) — f(z)| < $e for all y € Uz and all 
f € F. Since X is compact, we can choose z1, ...,£n E€ X such that J; Ur, = X. 
Then by pointwise boundedness, {f (x;): f € F, 1 < j < n} is a bounded subset 
of C, so there is a finite set {21,..., Zm} C C that is t e-dense in it — that is, 
each f(x;) is at a distance less than te from some z,. Let A = {z£1,..., £n} and 
B = {z1,..., Zm}; then the set B4 of functions from A to B is finite. For each 
p € BA, let 


F,={f €F:|f(x;) — o(a;)| < efri <j <n}. 


Then clearly |J $EBA F = F, and we claim that each F4 has diameter at most €, so 
we obtain a finite €-dense subset of F by picking one f from each nonempty Fẹ. To 
prove the claim, suppose f,g € Fy. Since |f — $| < $e and |g — 4| < e on A, we 
have |f — g| < łe on A. If x € X, we have x € Uzr, for some j, and then 


f(z) — g(@)| < |f (x) — Flay) + |F(@) — 9(25)| + lg(z;) — g(z£)| < €. 
This shows that F is totally bounded. Since the closure of a totally bounded set is 


totally bounded and C'(X ) is complete, the theorem is proved. a 


4.44 Arzela-Ascoli Theorem II. Let X be a o-compact LCH space. If {fn} is an 
equicontinuous, pointwise bounded sequence in C (X), there exist f € C(X) anda 
subsequence of { fn} that converges to f uniformly on compact sets. 
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Proof. By Proposition 4.39 there is a sequence {U;,} of precompact open sets 
such that U% C Ukķ+ı and X = ae Uk. By Theorem 4.43 there is a subsequence 
{ fn; }S21 of {fn} that is uniformly Cauchy on U4; we denote it by {f}}92,. Pro- 
ceeding inductively, for k € N we obtain a subsequence { f°}°2, of { i ee that 
is uniformly Cauchy on Ux. Let gz = fi; then {g+} is a subsequence of { f,, } which 
is (except for the first k — 1 terms) a subsequence of { i} and hence is uniformly 
Cauchy on each Up. Let f = lim gẹ. Then f € C(X) and g, — f uniformly on 
compact sets by Propositions 4.38 and 4.40. E 


Exercises 


58. If {Xa}a E A is a family of topological spaces of which infinitely many are 
noncompact, then every closed compact subset of | [ e4 Xa is nowhere dense. 


59. The product of finitely many locally compact spaces is locally compact. 


60. The product of countably many sequentially compact spaces is sequentially 
compact. (Use the “diagonal trick” as in the proof of Theorem 4.44.) 


61. Theorem 4.43 remains valid for maps from a compact Hausdorff space X into 
a complete metric space Y provided the hypothesis of pointwise boundedness is 
replaced by pointwise total boundedness. (Make this statement precise and then 
prove it.) 


62. Rephrase Theorem 4.44 in a form similar to Theorem 4.43 by using the metric 
in Exercise 56d. 


63. Let K € C([0,1] x [0, 1]). For f € C([0, 1]), let T f(z) = i. K (a, y) f(y) dy. 
Then T f € C((0,1]), and {Tf : || f|lu < 1} is precompact in C([0, 1]). 


64. Let (X, p) be a metric space. A function f € C(X ) is called Hélder continuous 
of exponent a (a > 0) if the quantity 


N.(f) = sup |f (x) =a Hy) 
TEY p(x, y) 
is finite. If X is compact, {f € C(X): fllu < 1 and Na(f) < 1} is compact in 
C(X). 


65. Let U be an open subset of C, and let {f,,} be a sequence of holomorphic 
functions on U. If {fa} is uniformly bounded on compact subsets of U, there is a 
subsequence that converges uniformly to a holomorphic function on compact subsets 
of U. (Use the Cauchy integral formula to obtain equicontinuity.) 


4.7 THE STONE-WEIERSTRASS THEOREM 


In this section we prove a far-reaching generalization of the well-known theorem of 
Weierstrass to the effect that any continuous function on a compact interval [a, 5] is 
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the uniform limit of polynomials on [a, b]. Throughout this section, X will denote a 
compact Hausdorff space, and we equip the space C (X) with the uniform metric. 

A subset A of C(X, R) or C(X) is said to separate points if for every x,y E€ X 
with x Æ y there exists f € A such that f(x) # f(y). A is called an algebra if it is 
a real (resp. complex) vector subspace of C(X, R) (resp. C(X )) such that fg E€ A 
whenever f,g € A. If A C C(X, R), Ais called a lattice if max(f, g) and min(f, g) 
are in A whenever f, g € A. Since the algebra and lattice operations are continuous, 
one easily sees that if A is an algebra or a lattice, so is its closure A in the uniform 
metric. 


4.45 The Stone-Weierstrass Theorem. Let X be a compact Hausdorff space. If A 
is a closed subalgebra of C'(X,R) that separates points, then either A = C(X,R) 
or A = {f € C(X,R): f(zo) = 0} for some xo € X. The first alternative holds iff 


A contains the constant functions. 


The proof will require several lemmas. The first one, in effect, proves the theorem 
when X consists of two points, and the second one is a special case of the classical 
Weierstrass theorem for X = [—1, 1]. After these two we return to the general case. 


4.46 Lemma. Consider R? as an algebra under coordinatewise addition and mul- 
tiplication. Then the only subalgebras of R? are R?, {(0,0)}, and the linear spans 
of (1,0), (0,1), and (1,1). 


Proof. The subspaces of R? listed above are evidently subalgebras. If A c R? 
is a nonzero algebra and (0,0) 4 (a,b) € A, then (a?,b?) € A. Ifa £0,040, 
and a Æ b, then (a,b) and (a?, b?) are linearly independent, so A = R?. The other 
possibilities — a Æ 0 = b, a = 0 Æ b, anda = b Æ 0 for all nonzero (a, b) € A — 
give the other three subalgebras. i 


4.47 Lemma. For any > 0 there is a polynomial P on R such that P(0) = 0 and 
| |x| — P(x)| < efor z € [-1, 1]. 


Proof. Consider the Maclaurin series for (1 — t)!/?: 


TEC 
=]1-— e (cn > 0). 


By the ratio test, this series converges for |t| < 1; a proof that its sum is actually 
(1 — t)!/? is outlined in Exercise 66. Moreover, by the monotone convergence 
theorem (applied to counting measure on N), 


OO 


— jj a — li — 1/2 _ . 
2 list Dent 1 lim t) 1 





140 POINT SET TOPOLOGY 


It follows from the finiteness of $7 cn that the series 1 — X7 cnt” converges 
absolutely and uniformly on [—1,1], and its sum is (1 — t)!/? there. Therefore, 
given € > 0, by taking a suitable partial sum of this series we obtain a poynoma 
Q such that |(1 — t)'/? — Q(t)| < ġe for t € [-1,1]. Setting t = 1 — z’ and 


R(x) = Q(1 — 2”), we obtain a polynomial R such that | |x| — R(z)| < Łe for 
x € [-1,1]. In particular, |R(0)| < $e, so if we set P(x) = R(x) — R(0), Pisa 
polynomial such that P(0) = 0 and | |x| — P(x)| < e for x € [—1, 1]. E 


4.48 Lemma. If A is a closed subalgebra of C(X,R), then |f| € A whenever 
f € A, and A is a lattice. 


Proof. Iff € Aand f £ 0, let h = f/||f\|.. Then h maps X into [—1, 1], so if 
€ > Oand P is as in Lemma 4.47, we have || |h| — Po h||u < €. Since P(0) = 0, 
P has no constant term, so P o h € A since A is an algebra. Since A is closed and 
c is arbitrary, we have |h| € A and hence |f| = || f||u|h| € A. This proves the first 
assertion, and the second one follows because 


max(f,g)=$(f+g9+l|f—gl),  min(f,g) = 3(f+9-If —9)). 


4.49 Lemma. Suppose A is a closed lattice in C(X, R) and f € C(X,R). If for 
every x,y E X there exists gry E A such that gry (x) = f(x) and gry(y) = 


then f E A. 


rly ) 


Proof. Given € > 0, foreach z,y € X let Uz, = {2 E X : f(z) < gry(z) +€} 
and Vry = {z E€ X : f(z) > gxzy(z) — €}. These sets are open and contain x and 
y. Fix y; then {Uzy : £ € X} covers X, so there is a finite subcover {Uz,,}7. Let 
Jy = MAX(gziys--+,Jrny); then f < gy+eonX and f > gy — € on Vy =); Vz,y; 
which is open and contains y. Thus {V} }yex is another open cover of X, so there is 
a finite subcover {V}, yr. Let g = min(gy,,---, Gym J; then || f — glu < €. Since A 
is a lattice, g € A, and since JA is closed and € is arbitrary, f € A. E 


Proof of Theorem 4.45. Given z #y E X, let Ary = { (f(x), f(y)): f € A}. 
Then Azzy is a subalgebra of R? as in Lemma 4.46 because f +> (f(x), f(y)) is an 
algebra homomorphism. If Azy = R? for all x, y, then Lemmas 4.48 and 4.49 imply 
that A = C(X, R). Otherwise, there exist x, y for which A,, is a proper subalgebra 
of R?. It cannot be {(0,0)} or the linear span of (1, 1) because A separates points, 
so by Lemma 4.46 Avy is the linear span of (1,0) or (0,1). In either case there 
exists xo E€ X such that f(xo) = 0 for all f € A. There is only one such zo since 
A separates points, so if neither z nor y is zo, we have Az, = R?. Lemmas 4.48 
and 4.49 now imply that A = {f € C(X, R) : f(zo) = 0}. Finally, if A contains 
constant functions, there is no zo such that f (zo) = 0 forall f € A, so A must equal 
C(X,R). E 
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We have stated the Stone-Weierstrass theorem in the form that is most natural for 
the proof. However, in applications one is typically dealing with a subalgebra B of 
C(X,R) that is not closed, and one applies the theorem to A = 8B. The resulting 
restatement of the theorem is as follows: 


4.50 Corollary. Suppose P is a subalgebra of C(X,R) that separates points. If 
there exists xo € X such that f(xo) = 0 for all f € B, then B is dense in 
{f € C(X,R) : f(xo) = 0}. Otherwise, B is dense in C(X, R). 


The classical Weierstrass approximation theorem is the special case of this corol- 
lary where X is a compact subset of R” and B is the algebra of polynomials on R” 
(restricted to X ); here B contains the constant functions, so the conclusion is that it 
is dense in C(X, R). 

The Stone-Weierstrass theorem, as it stands, is false for complex-valued functions. 
For example, the algebra of polynomials in one complex variable is not dense in C (K) 
for most compact subsets K of C. (In particular, if K° 4 ©, any uniform limit of 
polynomials on K must be holomorphic on K°.) Here we shall give a simple proof 
that the function f(z) = Z cannot be approximated uniformly by polynomials on the 
unit circle {e** : t € [0, 27]}. If P(z) = $o a;z?, then 


2r n 20 
Flet P(e") dt = a; | Ott dt = 0. 
(e) P(e**) dm ; 


Thus, abbreviating f(e’’) and P(e*’) by f and P, since |f| = 1 on the unit circle we 


have 
2m 
<|/ (f — Pfa + 
0 


2r 
j (f — P)F dt 


2m 
f f dt 


0 


2 = 


27 
[sea 
0 














2r 
< / if — Pl dt < 2nllf — Plo. 
0 








Therefore, || f — P||,, > 1 for any polynomial P. 
There is, however, a complex version of the Stone-Weierstrass theorem. 


4.51 The Complex Stone-Weierstrass Theorem. Let X be a compact Hausdorff 
space. If A is a closed complex subalgebra of C(X) that separates points and is 
closed under complex conjugation, then either A = C(X) or A = {f € C(X): 
f (xo) = 0} for some ro E X. 


Proof. Since Re f = (f + f)/2 and Im f = (f — f)/2i, the set Ag of real and 
imaginary parts of functions in A is a subalgebra of C(X, R) to which the Stone- 
Weierstrass theorem applies. Since A = {f + ig: f,g € Ap}, the desired result 
follows. E 


There is also a version of the Stone-Weierstrass theorem for noncompact LCH 
spaces. We state this result for real functions; the corresponding analogue of Theorem 
4.51 for complex functions is an immediate consequence. 
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4.52 Theorem. Let X be a noncompact LCH space. If A is a closed subalgebra of 
Co(X, R) (= Co(X) N C(X, R)) that separates points, then either A = Co(X,R) 
or ÅA = {f € Co(X,R) : f(xo) = 0} for some xo € X. 


The proof is outlined in Exercise 67. 


Exercises 


66. Let 1 — 7 cnt” be the Maclaurin series for (1 — t)1/?. 
a. The series converges absolutely and uniformly on compact subsets of (—1, 
as does the termwise differentiated series — }` 7  ncnt”™!. Thus, if f(t) 
tey et Aben f(t) =— Soy neat: 
b. By explicit calculation, f(t) = —2(1 — t)f’(t), from which it follows that 
(1 — t)~1/ f(t) is constant. Since f(0) = 1, f(t) = (1 — t)!/2. 
67. Prove Theorem 4.52. (If there exists xo € X such that f (xo) = O forall f € A, 
let Y be the one-point compactification of X \ {xo}; otherwise let Y be the one-point 


compactification of X. Apply Proposition 4.36 and the Stone-Weierstrass theorem 
on Y.) 


1), 


68. Let X and Y be compact Hausdorff spaces. The algebra generated by functions 
of the form f(x,y) = g(x)h(y), where g € C(X) and h € C(Y), is dense in 
C(X xY). 


69. Let A be a nonempty set, and let X = [0,1]. The algebra generated by the 
coordinate maps Ta : X — [0,1] (œ € A) and the constant function 1 is dense in 
C(X). 


70. Let X be a compact Hausdorff space. An ideal in C (X, R) is a subalgebra J of 
C(X,R) such that if f € J and g € C(X,R) then fg €J. 
a. If J is an ideal in C (X, R), let h(J) = {x € X : f(x) = 0 forall f € J}. 
Then h(I) is a closed subset of X, called the hull of J. 
b. If E C X, letk(E) = {f € C(X,R): f(x) = 0 for all z € E}. Then k(E) 
is a closed ideal in C (X, R), called the kernel of £. 
c. If E C X, then h(k(E)) = E. 
d. If J is an ideal in C(X,R), then k(h(J)) = J. ((Hint: k(h(J)) may be 
identified with a subalgebra of Co(U, R) where U = X \ h(J).) 
e. The closed subsets of X are in one-to-one correspondence with the closed 
ideals of C(X, R). 


71. (This is a variation on the theme of Exercise 70; it does not use the Stone- 
Weierstrass theorem.) Let X be a compact Hausdorff space, and let M be the set of 
all nonzero algebra homomorphisms from C(X,R) to R. Each x € X defines an 
element Z of M by z(f) = f(z). 
a. If ọ € M, then {f € C(X,R) : o(f) = 0} is a maximal proper ideal in 
C(X,R). 
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b. If J is a proper ideal in C'(X,R), there exists ro € X such that f(z0) = 0 
for all f € J. (Suppose not; construct an f € J with f > O everywhere and 
conclude that 1 € J. This requires no deep theorems.) 

c. The map z — 7 is a bijection from X to M. 

d. If M is equipped with the topology of pointwise convergence, then the map 
x — £ is a homeomorphism from X to M. (Since M is defined purely alge- 
braically, it follows that the topological structure of X is completely determined 
by the algebraic structure of C (X, R).) 


4.8 EMBEDDINGS IN CUBES 


We now present a technique for embedding topological spaces in products of intervals 
and discuss some of its applications. (These results will not be used elsewhere in this 
book.) Throughout this section we shall denote the unit interval [0, 1] by J, and if A 
is any nonempty set, we shall call the product space J4 a cube. 

If X is a topological space and F C CX, I), we say that F separates points 
and closed sets if for every closed & C X and every z € E° there exists f € F 
such that f(z) ¢ f(£). If F separates points and closed sets, there is another 
family § C C(X, I) with the following slightly stronger property: For every closed 
set & C X and every x € E° there exists g € G with g(x) = 1 and g = Oon 
E. (Indeed, if f € F satisfies f(x) ¢ f(E), take g = ġo f where ọ € C(I, I), 
o(f(r)) = 1, and ¢ = 0 on f(E).) It follows that a Tı space X admits a family F 
that separates points and closed sets iff X is completely regular. 

Each nonempty F C C(X,J) canonically induces a map e : X — IF by the 
formula 7 f(e(x)) = f(x), where 7p : IF — I is the coordinate map. We call e the 
map from X into the cube I? associated to F. (Evidently this construction can be 
generalized to target spaces other than J; see Exercise 20.) 








4.53 Proposition. Let X be a topological space, F C C(X,I), ande : X — I? be 
the map associated to F. Then 


a. eis continuous. 
b. If F separates points, then e is injective. 
c. If X is Ti and F separates points and closed sets, then e is an embedding. 


Proof. (a) follows from Proposition 4.11, and (b) is obvious. Next, observe that 
if F separates points and closed sets and X is Tj, then e is injective by (b) and 
Proposition 4.7. To prove the continuity of the inverse, suppose that U is open in X. 
If x € U, choose f € F with f(x) ¢ f(U°) and let 





Vea FOS = {pe 17: y(n) ¢ FO}. 


Then V is open in I? and e(x) € VN e(X) c e(U). Thus e(U) is a neighborhood 
of e(x) in e(X) at every x € U, so e(U) is open in e(X). It follows that e~? is 
continuous. E 
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4.54 Corollary. Every compact Hausdorff space is homeomorphic to a closed subset 
of a cube. 


Proof. By Proposition 4.25 and Urysohn’s lemma, we can take F = C(X, I). g 


4.55 Corollary. A topological space is completely regular iff it is homeomorphic to 
a subset of a compact Hausdorff space. 


Proof. Proposition 4.53, with F = CX, I), gives the “only if” implication; the 
converse is left to the reader (Exercise 72). E 


A compactification of a topological space X is a pair (Y, ¢) where Y is a compact 
Hausdorff space and ¢ is a homeomorphism from X onto a dense subset of Y. 
(Frequently one identifies X with its image ¢(.X ) C Y and then speaks simply of “the 
compactification Y of X.”) For example, ({—1, 1], tanh) is a compactification of R, 
and the one-point compactification (X*,z) of an LCH space X is a compactification 
in the present sense, where 2 : X — X™* is the inclusion map. 

Suppose X is completely regular. According to Proposition 4.53, if F c C(X, I) 
separates points and closed sets, e : X — IF is the associated embedding, and Y is 
the closure of e(X) in If, then (Y, e) is a compactification of X. It has the property 
that if we identify X with its image e( X), every f € F has a continuous extension to 
Y, which is unique since X is dense in Y. Indeed, the identification of X with e( X) 
turns f into the coordinate map 7 f|e(X ), which extends to 7|Y. Moreover, if f and 
g are bounded continuous functions on X that extend continuously to Y, obviously 
so are f +g and fg, and if { fn} is a uniformly convergent sequence of functions on 
X that extend continuously to Y, their extensions converge uniformly on Y since X 
is dense in Y, so f = lim fn also extends continuously. We have proved: 


4.56 Proposition. Suppose that F C C(X, I) separates points and closed sets. Let 
(Y, e) be the compactification of X associated to F, and let A be the smallest closed 
subalgebra of BC( X ) that contains F. Then every f € A has a continuous extension 
to Y. 


This result has a converse: see Exercise 73. 

If X is a completely regular space, the compactification of X associated to 
F = C(X, T) is called the Stone-Cech compactification of X and is denoted by 
(GX,e), or simply by 6X if we identify X with eX). Every f € BC(X) extends 
continuously to 8X; in fact, a much more general result holds: 


4.57 Theorem. If X is acompletely regular space, Y is a compact Hausdorff space, 
and ġ E€ C(X,Y), then ¢ has a unique continuous extension b to BX — that is, 
there is a unique pe C(GX,Y) such that poe=¢. If (Y, ) is a compactification 
of X, then b is surjective; if also every f € BC(X) extends continuously to Y (i.e., 


f = g o ġ for some g € C(Y)), then ¢ is a homeomorphism. 


Proof. Let F = C(X,I) and 9 = C(Y, I), and let (BY, i) be the Stone-Cech 
compactification of Y. (That is, i: Y — IÏ is the embedding associated to 9, 
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and GY = i(Y); GY is homeomorphic to Y since Y is compact.) Given ¢ € 
C(X,Y), define ® : IF — IS by 14(®(p)) = Tgog (p). The map © is continuous by 
Proposition 4.11, and 


Tg(P(e(x))) = Tgog(e(2)) = 9((£)) = 7o(2(O(2))), 


that is, Po e = io ¢. It follows that O(e(X)) = 7(¢(X)) C BY and hence that 
(3X) c BY = BY. The situation is summarized in the following commutative 
diagram: 


Let 6 = i7! o (16X). Then poe =i! ooe = ġ, and uniqueness of b 
is clear since e( X) is dense in 8X; thus the first assertion is proved. If (Y, ¢) is a 
compactification of X, then ¢(X) is dense in Y; but then o(BX ) is dense in Y and 
also compact, so that (3X) = Y. Finally, if every f € BC(X) is of the form 
g o ¢ for some g E€ C(Y), then ® is injective; hence b is bijective and therefore, by 
Proposition 4.28, a homeomorphism. E 


This theorem shows that GX is the “largest” compactification of a completely 
regular space X, in the sense that every other compactification is a continuous image 
of it. Atthe other end of the scale, if X is locally compact, then F = C.(X )NC(X, I) 
Separates points and closed sets by Urysohn’s lemma. A glance at the construction 
of the compactification (Y,e) associated to this F shows that Y consists of e( X) 
together with the single point of [7 all of whose coordinates are zero. It is then easy 
to verify that Y is homeomorphic to the one-point compactification of X constructed 
in §4.5. 

As a final application of the embedding e : X — IF, we give a partial answer to 
the question: When is a topological space metrizable, that is, when is its topology 
defined by a metric? A necessary condition for X to be metrizable is that X be 
normal (Exercise 3). On the other hand: 


4.58 The Urysohn Metrization Theorem. Every second countable normal space 
is metrizable. 


Since every subset of a metrizable space is metrizable (with the same metric), 
this theorem is an immediate consequence of Proposition 4.53 and the following two 
facts, whose proofs are outlined in Exercises 76 and 77: 


o If X is normal and second countable, there is a countable family F C C(X, J) 
that separates points and closed sets. 


e If Fis countable, IF is metrizable. 
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Exercises 


72. Every subset of a completely regular space is completely regular in the relative 
topology. 


73. If X is a completely regular space, a subalgebra A of BC(X) is called com- 
pletely regular if (i) it is closed and contains the constant functions, and (i1) 
AN C(X, I) separates points and closed sets. 
a. If (Y, e) is a Hausdorff compactification of X, Ay = {f oe: f EC(Y)} is 
a completely regular subalgebra of BC (X). 
b. If (Y,e) and (Y’, e’) are Hausdorff compactifications of X such that Ay = 
Ay, there is a homeomorphism ¢ : Y — Y” such that ¢ o e = e’. (Adapt the 
proof of Theorem 4.57, which deals with the case Y = 3X.) 
c. If (Y,e) is the compactification of X associated to F C C(X, I), then Ay is 
the smallest closed subalgebra of BC(X ) that contains F. (Use Exercise 69.) 
d. The Hausdorff compactifications of X are in one-to-one correspondence with 
the completely regular subalgebras of BC'(X). 


74. Consider N (with the discrete topology) as a subset of its Stone-Cech compacti- 
fication GN. 
a. If A and B are disjoint subsets of N, their closures in GN are disjoint. (Hint: 
xa E C(N,1).) 
b. No sequence in N converges in GN unless it is eventually constant (so GN is 
emphatically not sequentially compact). 


75. Suppose X is a completely regular space. The set M of nonzero algebra 
homomorphisms from BC(X,R) to R, equipped with the topology of pointwise 
convergence, is homeomorphic to 6X. (See Exercise 71. This realization of GX is 
the natural one from the point of view of Banach algebra theory.) 


76. If X is normal and second countable, there is a countable family F c C(X, J) 
that separates points and closed sets. (Let B be a countable base for the topology. 
Consider the set of pairs (U,V) € B x B such that U C V, and use Urysohn’s 
lemma.) 


77. Let {(Xn, Pn)}$° be a countable family of metric spaces whose metrics take 
values in [0,1]. (The latter restriction can always be satisfied; see Exercise 56b.) 
Let X = | [7 Xn. If z,y € X, say z = (21, 22,...) and y = (y1, Y2, . . -), define 
p(z, y) = OP 27" pn(£n, Yn). Then p is a metric that defines the product topology 
on X. 


4.9 NOTES AND REFERENCES 


The germ of the concept of topological space is clearly present in Riemann’s lecture 
[113] on the foundations of geometry, delivered in 1854, but another half century 
passed before the mathematical world was ready to consider abstract spaces in a 
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systematic way. The first attempt to construct an abstract framework for the study of 
limits and continuity was made in 1906 by Fréchet [52], who introduced metric spaces 
as well as a more general class of quasi-topological spaces whose properties were 
defined in terms of sequential convergence. A few years later Hausdorff [68] devised 
axioms for neighborhoods of points that amount to the definition of a Hausdorff 
space, and he deduced from them many of the basic results of general topology. The 
usefulness of his point of view was quickly recognized, and it became the foundation 
for the further development of the subject. 

There are several good books to which the reader may refer for a more com- 
prehensive treatment of point set topology, including Bourbaki [20], Dugundji [34], 
Engelking [38], Kelley [83], and Nagata [102]. Engelking [38] contains extensive 
references and historical notes. 


84.2: Urysohn’s lemma and the Tietze extension theorem were both first proved 
in Urysohn [152]. Special cases of the latter had previously been obtained by several 
authors, including Tietze (see [152] for references). Examples of completely regular 
spaces that are not normal and regular spaces that are not completely regular, which 
are all rather complicated, were first constructed by Tychonoff [151]. Particularly 
noteworthy is the existence of a regular space that admits no nonconstant continuous 
functions, a result due to Hewitt [73]. Examples may also be found in the books cited 
above. 


84.3: The theory of nets is sometimes called the Moore-Smith theory of conver- 
gence, after its originators [101]. Another general theory of convergence, invented 
by H. Cartan and publicized by Bourbaki, is based on the notion of filters. A filter in 
a set X is a family F C P(X) with the following properties: 


e If F efFand ED  F,then E €F. 
elfheFanF EF, then ENF €GF. 
e gT. 


If X is a topological space, a filter F in X converges to x € X if every neighborhood 
of x belongs to F. Filters and nets are related as follows. If (£a)aca is a net in X, 
its derived filter is the collection of all & C X such that (za) is eventually in Æ. 
On the other hand, if F is a filter, then F is a directed set under reverse inclusion, and 
a net (zx F) res indexed by F is said to be associated to F if xp € F forall F € F. 
It is then easy to verify that a net (£a) converges to x iff its derived filter converges 
to x, and a filter F converges to x iff all of its associated nets converge to z. See 
Bourbaki [20] or Dugundji [34] for more information. 


84.4: The usage of the term “compact” is not completely standardized. In many 
older works the terms “compact” and “bicompact” were used to mean countably 
compact and compact, respectively, and some authors use “compact” and “quasi- 
compact” to mean compact Hausdorff and compact, respectively. Synonyms for 
“precompact” that are frequently found in the literature are “conditionally compact” 
and “relatively compact”; the latter one is infelicitous because it suggests compactness 
in the relative topology, which is quite different. 
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§4.6: Tychonoff [151] proved that [0, 1]4 is compact for any set A; together with 
Corollary 4.54, which is in the same paper, this easily implies that any product of 
compact Hausdorff spaces is compact. The Tychonoff theorem in full generality is 
due to Cech [23]. The proof we have presented, which is simpler and more elegant 
than the older ones, is due to Chernoff [24]. 

The axiom of choice, usually in the form of Zorn’s lemma, is an essential ingredient 
in all the proofs of Tychonoff’s theorem. It is an intriguing fact, discovered by Kelley 
[82], that Tychonoff’s theorem in turn implies the axiom of choice. Here is the proof: 

Suppose that {Xa }ac isanonempty collection of nonempty sets. Pick a point w 
that is not an element of any Xa, set X% = Xa U {w}, and define a topology on Xž 
by declaring the open sets to be Ø, Xa, {w}, and X*. Evidently X* is compact, so 
Tychonoff’s theorem implies that X* = Į [ e 4 X% is compact. Let Fy = 7y*(Xa). 
The sets Fa are closed, and by the axiom of choice for finite collections of sets — 
which is provable from the other standard axioms of set theory — they have the 
finite intersection property. Indeed, given a finite set B C A, pick zg € Xg for 
B € B; then f geg Fg contains the point x E€ X such that mg(x) = zg for 8 € B 
and 7,(r) = w fora ¢ B. By Proposition 4.21, (xea Fa, which is precisely 
lluca Xa, is nonempty. 

By elaboration of this argument, one can deduce the axiom of choice from the 
special case of Tychonoff’s theorem that X4 is compact for any A if X is compact; 
see Ward [156]. 

The original results of Arzelà and Ascoli had to do with functions on R; see Arzelà 
[6]. Other versions of the Arzelà-Ascoli theorem, pertaining to the compactness of 
subsets of C'(X, Y) under various hypotheses on X and Y, can be found in the books 
cited above and in Royden [121]. 


84.7. The Stone-Weierstrass theorem first appeared in the middle of a lengthy and 
difficult paper of Stone [144]. Later Stone [145] wrote a much-simplified exposition 
of the theorem and some of its applications, which still makes good reading. 


84.8. The history of this material begins with Urysohn [153], where the metrization 
theorem is proved, essentially by the method we have outlined. The technique of 
embedding spaces in cubes is implicit in this paper, but it was first developed explicitly 
in Tychonoff [151]. The Stone-Cech compactification, in turn, is implicit in the latter 
paper, but it was first described explicitly and investigated by Stone [144] and Cech 
[23]. 

It is not hard to show that every second countable regular space is normal (see 
Kelley [82, Lemma 4.1]; consequently, the hypothesis of normality in the Urysohn 
metrization theorem can be replaced by regularity. Necessary and sufficient condi- 
tions are known for an arbitrary topological space to be metrizable, but they are not as 
readily verifiable as the conditions in Urysohn’s theorem. See the books cited above. 

Occasionally the term “compactification” is used to mean a continuous injection 
@: X — Y from a topological space X onto a dense subset of a compact space 
Y without the requirement that it be an embedding. Such “compactifications” arise 
from subalgebras of C'(X) that separate points but are not completely regular in 
the sense of Exercise 73. An example is provided by the algebra of “uniformly 
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almost periodic” functions on R, which is the algebra generated by the functions 
f(z) = e^t, A € R; the associated “compactification” of R is known as the Bohr 
compactification. See Folland [47, §4.7]. 





Elements of Functional 
Analysis 


“Functional analysis” is the traditional name for the study of infinite-dimensional 
vector spaces over R or C and the linear maps between them. What distinguishes this 
from mere linear algebra is the importance of topological considerations. On finite- 
dimensional vector spaces there is only one reasonable topology, and linear maps are 
automatically continuous, but in infinite dimensions things are not so simple. (As 
we have already observed, if {fn} is a sequence of functions on R, there are many 
things one can mean by the statement “fn — f.’) As our aim in this chapter is only 
to give a brief introduction to the subject, we shall restrict attention — except in §5.4 
— to topologies defined by norms on vector spaces. 


5.1 NORMED VECTOR SPACES 


Let K denote either R or C, and let X be a vector space over K. We denote the zero 
element of X simply by 0, relying on context to distinguish it from the scalar 0 € K. 
By a subspace we shall always mean a vector subspace. If x € X, we denote by Kz 
the one-dimensional subspace spanned by x. Also, if M and N are subspaces of X, 
M + N denotes the subspace {x + y : x E M, y E N}. 

A seminorm on X is a function z +> ||z|| from X to [0, oo) such that 


è |jz + y|| < |||] + |ly|| for all x, y € X (the triangle inequality), 


e |Az|| = JA] ||z|| for all x € X and A € K. 
151 
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The second property clearly implies that ||0|| = 0. A seminorm such that ||z|| = 0 
only when z = 0 is called a norm, and a vector space equipped with a norm is called 
a normed vector space (or normed linear space). 


If X is a normed vector space, the function p(z, y) = ||z — y|| is a metric on X, 
since 
lz = z| < le -yll + lly- zl lz- yll = ICD -= y= ly - all. 
The topology it defines is called the norm topology on X. Two norms || - ||ı and 


|| - [2 on X are called equivalent if there exist C1, C2 > 0 such that 
Cillællı < llæll2 < Calla; (æ €X). 


Equivalent norms define equivalent metrics and hence the same topology and the 
same Cauchy sequences. 

A normed vector space that is complete with respect to the norm metric is called 
a Banach space. (Every normed vector space can be embedded in a Banach space 
as a dense subspace. One way to do this is to mimic the construction of R from Q 
via Cauchy sequences; we shall present a simpler way in §5.2.) The following is a 
useful criterion for completeness of a normed vector space. If {xn} is a sequence in 
X, the series $07 £n is said to converge to zx if y In — zas N — œ, and it is 
called absolutely convergent if X077 ||zn || < 00. 


5.1 Theorem. A normed vector space X is complete iff every absolutely convergent 
series in X converges. 


Proof. If X is complete and X7 ||rn|| < œ, let Sy = DY £n. Then for 
N > M we have 


N 
ISN — Sml < z. znl|| — 0 as M, N > oo, 
M+1 


so the sequence {Sy } is Cauchy and hence convergent. Conversely, suppose that 
every absolutely convergent series converges, and let {x„} be a Cauchy sequence. 
We can choose nj < ng < --: such that ||z, — £m|| < 277 for m,n > nj. Let 


Yı = Tn, and yj = In, —XLn;_, for 7 > 1. Then yi Yj = In,, and 


> llysll < liall + $527 = luill +1 < œ, 
1 1 


solimzn, = X07 y; exists. But since {£n } is Cauchy, it is easily verified that {£n } 
converges to the same limit as {z,,, }. E 


We have already seen some examples of Banach spaces. First, if X is a topolog- 
ical space, B(X) and BC(X) are Banach spaces with the uniform norm || f |lu = 
SUPzex |f (x)|. Second, if (X,M, p) is a measure space, L! (p) is a Banach space 
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with the L! norm ||f||ı = f |f| du. (Observe that || - ||: is only a seminorm if we 
think of L! (u) as consisting of individual functions, but it becomes a norm if we 
identify functions that are equal a.e.) That L! (u) is complete follows from Theorems 
2.25 and 5.1. Indeed, if X070 || fnll1 < oo, Theorem 2.25 shows that f = °° fn 
exists a.e., and 


[\r-don 


More examples will be found in Exercises 8-11 and in subsequent sections. 
If X and Y are normed vector spaces, X x Y becomes a normed vector space when 
equipped with the product norm 


du < X> | faldu 008 N > o. 


N+1 





Iæ, y)|| = max(liel, Iyl). 


(Here, of course, ||x|| refers to the norm on X while ||y|| refers to the norm on Y.) 
Sometimes other norms equivalent to this one, such as ||(z, y)|| = ||z|| + Iyl] or 
Iy) = (lell? + llyll?)"/?, are used instead. 

A related construction is that of quotient spaces. If M is a vector subspace of 
the vector space X, it defines an equivalence relation on X as follows: x ~ y iff 
x —y €M. The equivalence class of x € X is denoted by x + M, and the set of 
equivalence classes, or quotient space, is denoted by X/M. X/M is a vector space 
with vector operations (xz +M) + (y +M) = (£ +y)+M and A(z +M) = (Ar) +M. 
If X is a normed vector space and M is closed, X/M inherits a norm from X called 
the quotient norm, namely 


M|| = inf 
le +M] = inf le + ul 


See Exercise 12 for a more detailed discussion. 
A linear map T : X — Y between two normed vector spaces is called bounded if 
there exists C > 0 such that 


|x| < C||z|| for all x € X. 


This is different from the notion of boundedness for functions on a set, according to 
which T' would be bounded if ||T'x|| < C for all x. Clearly no nonzero linear map 
can satisfy the latter condition, since T (Az) = AT x for all scalars A. The present 
definition means that T is bounded on bounded subsets of X. 


5.2 Proposition. If X and Y are normed vector spaces and T : X — Y is a linear 
map, the following are equivalent: 


a. T is continuous. 
b. T is continuous at 0. 
c. T is bounded. 


Proof. That (a) implies (b) is trivial. If T is continuous at 0 € X, there is a 
neighborhood U of 0 such that T(U) c {y € Y: |ly|] < 1}, and U must contain 
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a ball B = {x € X : ||x|| < 6} about 0; thus ||T'z|| < 1 when ||z|| < 6. Since 
T commutes with scalar multiplication, it follows that ||Tz|| < a7} whenever 
\z|| < a, that is, ||Tz|| < 67+||z||. This shows that (b) implies (c). Finally, 
if ||Z'x|| < C||æ|| for all x, then ||Tzı — Txe|| = |/T'(21 — x2)|| < € whenever 
[zı — x|| < C~+e, so that T is continuous. E 


If X and Y are normed vector spaces, we denote the space of all bounded linear 
maps from X to Y by D(X, 4). It is easily verified that D(X, Y) is a vector space and 
that the function T — ||T|| defined by 


ITI = sup{ |Tel] : lell = 1} 
(5.3) = sup i Eed HA e) 
lll 
= inf {C : ||Tz]|| < C||zx|| for all <} 





is a norm on L(X, Y), called the operator norm (Exercise 2). We always assume 
L(X, Y) to be equipped with this norm unless we specify otherwise. 


5.4 Proposition. [fY is complete, so is L(X, Y). 


Proof. Let {T„} be a Cauchy sequence in L(X, Y). If x € X, then {Tanz} is 
Cauchy in Y because ||T,,2 — Tmel] < Tn — Tin|l ||z||. Define T : X — Y by 
Tx = lim Taz. We leave it to the reader (Exercise 3) to verify that T € D(X, Y) (in 
fact, ||Z"|| = lim ||T;, ||) and that ||7, — T'|| — 0. E 








Another useful property of the operator norm is the following. IfT e L(X,Y4) 
and S € L(Y, Z), then 


[STel] < [PS] Pal < ISITI Mell, 


so that ST € L(X, Z) and ||ST|| < IISI] ||T]|. In particular, L(X, X) is an algebra. 
If X is complete, L(X, X) is in fact a Banach algebra: a Banach space that is also 
an algebra, such that the norm of a product is at most the product of the norms. 
(Another example of a Banach algebra is BC (X), where X is a topological space, 
with pointwise multiplication and the uniform norm.) 

fT € L(X,Y), T is said to be invertible, or an isomorphism, if T is bijective 
and TT! is bounded (in other words, ||T'x|| > C||z|| for some C > 0). T is called an 
isometry if ||T'x|| = ||z|| for all x € X. An isometry is injective but not necessarily 
surjective; it is, however, an isomorphism onto its range. 





Exercises 


1. If X is a normed vector space over K (= R or C), then addition and scalar 
multiplication are continuous from X x X and K x X to X. Moreover, the norm is 
continuous from X to [0, 00); in fact, | ||z|| — |ly|| | < Illz — yll- 


2. L(X,Y) is a vector space and the function || - || defined by (5.3) is a norm on it. 
In particular, the three expressions on the right of (5.3) are always equal. 
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3. Complete the proof of Proposition 5.4. 


4. If X,Y are normed vector spaces, the map (T,r) —> Tz is continuous from 
L(X,Y) x X to Y. (That is, if Tn —> T andz, —> x then T,27, — Tz.) 


5. If X is anormed vector space, the closure of any subspace of X is a subspace. 


6. Suppose that X is a finite-dimensional vector space. Let e;,...,€n be a basis 
n n 
for X, and define || 5°, a;e;|]1 = $} laji. 
a. ||- ||: is a norm on X. 
b. The map (a1,...,an) + >} aje; is continuous from K” with the usual 


Euclidean topology to X with the topology defined by || - ||1. 
c. {x €X: ||x||, = 1} is compact in the topology defined by || - ||1. 
d. All norms on X are equivalent. (Compare any norm to || - |li.) 


7. Let X be a Banach space. 
a. If T € L(X, X) and ||J — T|| < 1 where J is the identity operator, then T is 
invertible; in fact, the series }`ọ (I — T)” converges in L(X, X) to T+. 
b. If T € L(X, X) is invertible and ||S — T|| < ||T—1||~, then S is invertible. 
Thus the set of invertible operators is open in D(X, X). 


8. Let (X,M) be a measurable space, and let M(X) be the space of complex 
measures on (X, MM). Then ||al| = |~|(X) is a norm on M(X) that makes M(X) 
into a Banach space. (Use Theorem 5.1.) 


9. Let C*((0,1]) be the space of functions on [0, 1] possessing continuous deriva- 
tives up to order k on (0, 1], including one-sided derivatives at the endpoints. 
a. If f € C((0,1]), then f € C*((0, 1]) iff f is k times continuously differen- 
tiable on (0,1) and lim,\9 f(z) and lim, 71 f(x) exist for j < k. (The 
mean value theorem is useful.) 
b. |f = 326 IFO llu is a norm on C*((0,1]) that makes C*((0, 1}) into a 
mie space. (Use induction on k. The essential point is that if {fn} C 
C'((0,1]), fr — f uniformly, and ff — g e then pA ; c? ((0, a and 
f’ = g. The easy way to prove this is to show that f(z) = f g( t) dt.) 


10. Let L} ([0, 1]) be the space of all f € C*—1((0, 1]) such that goes is absolutely 
continuous on [0, 1] (and hence f‘*) exists a.e. and is in D1((0,1])). Then || f|| = 
a. T |f (x)| dr is anorm on L} ([0, 1]) that makes L: ([0, 1]) into a Banach space. 
(See Exercise 9 and its hint.) 


11. If 0 < a < 1, let Aa([0, 1]) be the space of Hölder continuous functions of 
exponent a on [0, 1]. That is, f € Aa([0, 1]) iff || lla, < co, where 


flac =[fO)|+ sup A 


webi az |e — yl 


a. ||- lla, is a norm that makes A,((0, 1]) into a Banach space. 
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b. Let Aq ([0, 1]) be the set of all f € Ag([0, 1]) such that 
If(x) - fy)| 


— Oas z — y, forall y € (0, 1]. 
|z — yl? 


If a < 1, AQ (0, 1]) is an infinite-dimensional closed subspace of Aa ([0, 1]). If 
a = 1, Aa ([0, 1]) contains only constant functions. 


12. Let X be a normed vector space and M a proper closed subspace of X. 

. |z + M|| = inf{||2 + yl] : y E€ M} is a norm on X/M. 

b. For any € > 0 there exists x € X such that ||x|| = 1 and ||z + M]| > 1 — €. 
c. The projection map m(x) = x + M from X to X/M has norm 1. 

d. If X is complete, so is X /M. (Use Theorem 5.1.) 

e. The topology defined by the quotient norm is the quotient topology as defined 
in Exercise 28 in 84.2. 


13. If ||- || is aseminorm on the vector space X, let M = {x € X : ||z|| = 0}. Then 
M is a subspace, and the map z + M +> ||z|| is a norm on X/M. 


14. 
defined in Exercise 12, is a seminorm on X/M. If one divides by its nullspace as in 
Exercise 13, the resulting quotient space is isometrically isomorphic to X/M. (Cf. 
Exercise 5.) 


9 





15. Suppose that X and Y are normed vector spaces and T € L(X, Y). Let N(T) = 
{x EX:Tx =O}. 

a. N(T) is a closed subspace of X. 

b. There is a unique S € L(X/N(T), Y) such that T = So m where 7: X — 
= |T|. 





16. The purpose of this exercise is to develop a theory of integration for functions 
with values in a separable Banach space. Let (X,M, p) be a measure space, Y a 
separable Banach space, and Ly the space of all (M, By)-measurable maps from 
X to Y, and Fy the set of maps f : X — Y of the form f(x) = $`} XE, (x) 

where n € N, y; € Y, E; € M, and (Ej) < œ. If f € since y > te 
is continuous (Exercise 1), xz +> ||f(x)|| is (M, Bgr)-measurable, and we define 


fl = J IF (@)ll du(x). Finally, let Ly = {f € Ly: || fll, < oo}. 


a. Ly is a vector space, Fy and Ly are subspaces of it, Fy C Li, and || - ||; is a 
seminorm on Ly that becomes a ne if we identify two functions that are equal 
a.e. 


b. Let {yn} be a countable dense set in Y. Given e > 0, let BE = {y € Y : 
ly — ynll < €llyn||}. Then Up By, >% \ {0}. 

c. If f € Li, thereisa sequence {hn} C Fy with hn — f a.e. and ||hn — fll > 
0. (With notation as in (b), let An; = B} Vig ae Bi and Enj = f7*(An;), 
and consider gj = X p1 YnX Ep, ' 

d. There is a unique linear map — Y such that f yxs = p(E)y for 
ye Yard E EM (BY < oo) and | PAI < If 
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e. The dominated convergence theorem: If {fn} is a sequence in Ly such that 
fn — f a.e., and there exists g € L! such that || f,,(x)|] < g(x) for all n and a.e. 
x, then f fa > f f. 

f. If Z is a separable Banach space, T € L(Y, Z), and f € Li, then T o f € L} 
and [Tof =T(f f). 


5.2 LINEAR FUNCTIONALS 


Let X be a vector space over K, where K = R or C. A linear map from X to K is 
called a linear functional on X. If X is a normed vector space, the space L(X, K) 
of bounded linear functionals on X is called the dual space of X and is denoted by 
X*. According to Proposition 5.4, X* is a Banach space with the operator norm. 

If X is a vector space over C, it is also a vector space over R, and we can consider 
both real and complex linear functionals on X, that is, maps f : X — R that are 
linear over R and maps f : X — C that are linear over C. The relationship between 
the two is as follows: 


5.5 Proposition. Let X be a vector space over C. If f is a complex linear functional 
on X and u = Re f, then u is a real linear functional, and f(x) = u(x) — iu(ix) for 
all x € X. Conversely, ifu is a real linear functional on X and f : X — C is defined 
by f(x) = u(x) — iu(ix), then f is complex linear. In this case, if X is normed, we 
have ||ul| = || f|. 


Proof. If f is complex linear and u = Re f, uis clearly real linear and Im f(z) = 
— Refi f(x)] = —u(ix), so f(x) = u(x) — iu(ix). On the other hand, if u is real 
linear and f(x) = u(x) — iu(ix), then f is clearly linear over R, and f(ixz) = 
u(ix) —iu(—xr) = u(ix) + iu(x) = i f(x), so f is also linear over C. Finally, if X is 
normed, since |u(x)| = | Re f (x)| < |f (x)| we have ||ul| < || f||. On the other hand, 
if f(x) # 0, let a = sgn f(x). Then |f(x)| = af(x) = f(ax) = u(az) (since 
f(x) is real), so |f(x)| < llull azl] = |u| læ], whence || fl] < Ilall: : 


It is not obvious that there are any nonzero bounded linear functionals on an 
arbitrary normed vector space. The fact that such functionals exist in great abundance 
is one of the fundamental theorems of functional analysis. We shall now present this 
result in a more general form that has other important applications. 

If X is areal vector space, a sublinear functional on X is a map p : X — R such 
that 








p(x +y) < p(x) + p(y) and p(Ax) = Ap(z) for all x,y € X and A > 0. 
For example, every seminorm is a sublinear functional. 


5.6 The Hahn-Banach Theorem. Let X be a real vector space, p a sublinear func- 
tional on X, Ma subspace of X, and f a linear functional on M such that f(x) < p(x) 
for all x € M. Then there exists a linear functional F on X such that F(x) < p(x) 
forallz € X and F|M = f. 
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Proof. We begin by showing that if x € X \ M, f can be extended to a linear 
functional g on M + Rz satisfying g(y) < p(y) there. If y1, ye E M, we have 


flys) + F(y2) = f(y + y2) < p(y + y2) < p(y — £) + p(x + y2), 


f(y.) — p(y — z) < p(x + y2) — f lye). 


Hence 


sup{ f(y) —p(y—2):y EM} < inf {p(x +y) — f(y): y EM}. 


Let a be any number satisfying 


sup{ f(y) —p(y—z): ye M} <a <inf{p(c+y)—- f(y): yeM} 


and define g : M+ Rz — R by g(y + Az) = f(y) + Aa. Then g is clearly linear, 
and g|M = f, so that g(y) < p(y) for y € M. Moreover, if À > 0 and y E€ M, 


g(y+ Ax) = A[f(y/A) +a] < A[F(y/A) + P(e + (y/A)) — Fy/A)] = ply + Az), 


whereas if A = —p < 0, 


gly + àx) = u[f(y/u)—a] < BL F(y/H) — f(y/e) + 2((y/h) —2)] = v(y+Az). 


Thus g(z) < p(z) forall z E€ M+ Rz. 

Evidently the same reasoning can be applied to any linear extension F of f 
satisfying F < p on its domain, and it shows that the domain of a maximal linear 
extension satisfying F < p must be the whole space X. But the family F of all 
linear extensions F of f satisfying F < p is partially ordered by inclusion (maps 
from subspaces of X to R being regarded as subsets of X x R). Since the union of 
any increasing family of subspaces of X is again a subspace, one easily sees that the 
union of a linearly ordered subfamily of F lies in F. The proof is therefore completed 
by invoking Zorn’s lemma. E 


If p is aseminorm and f : X — R is linear, the inequality f < p is equivalent to 
the inequality |f| < p, because |f (x)| = +f (x) = f(+x) and p(—x) = p(x). In 
this situation the Hahn-Banach theorem also applies to complex linear functionals: 


5.7 The Complex Hahn-Banach Theorem. Let X be a complex vector space, p a 
seminorm on X, M a subspace of X, and f a complex linear functional on M such 
that | f (x)| < p(x) for x € M. Then there exists a complex linear functional F on X 
such that |F(x)| < p(x) for all x € X and F\M = Ô. 


Proof. Letu = Re f. By Theorem 5.6 there is a real linear extension U of u to X 
such that |U (x)| < p(x) forall z € X. Let F(x) = U(x) — iU (iz) as in Proposition 
5.5. Then F'is a complex linear extension of f, and as in the proof of Proposition 5.5, 


if a = sgn F (x) we have |F(x)| = aF (x) = F(ar) = U (az) < p(ar) = p(x). g 
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From now on until 85.5, all of our results apply equally to real or complex vector 
spaces, but for the sake of definiteness we shall assume that the scalar field is C. 
The principal applications of the Hahn-Banach theorem to normed vector spaces are 
summarized in the following theorem. 


5.8 Theorem. Let X be a normed vector space. 

a. IfM is a closed subspace of X and x € X \ M, there exists f € X* such that 
f(z) 4 0 and f|M = 0. In fact, if 6 = inf 
satisfy || f || = 1 and f(x) = 

b. Ifx A0€X, there exists f € X* such that || f || = 1 and f(x) = ||x||. 

c. The bounded linear functionals on X separate points. 

d. Ifx € X, define T : X* — C by T( f) = f(x). Then the map x + T is a linear 
isometry from X into X** (the dual of X*). 





Proof. To prove (a), define f on M + Cz by f(y + Az) = rAd (y E M, à € ©). 
Then f(x) = 6, f|M = 0, and for A 4 0, | f(y + Az)| = 
ly + Az||. Thus the Hahn-Banach theorem can be applied, with p(x) = ||z|| and M 
replaced by M + Cz. (b) is the special case of (a) with M = {0}, and (c) follows 
immediately: if x # y, there exists f € X* with f(x — y) # 0, i.e., f(z r) # f(y). 
As for (d), obviously Z is a linear functional on X* and the map z +> 7 is linear. 

(BQ) = fæ) < If llell, so [ll] < lle]. On the other hand, (b) 
implies that |||] > |x|]. E 

















With notation as in Theorem 5.8d, let X = {T :x € X}. Since X** is always 
complete, the closure a of X i in X** is a Banach space, and the map z +> x embeds 
X into X as a dense subspace. ce is called the completion of X. In particular, if X is 


itself a Banach space then C= xX. 

If X is finite-dimensional, then of course eS X**, since these spaces have the 
same dimension. For infinite-dimensional Banach spaces it may or may not happen 
that X = X**: if it does, X is called reflexive. The examples of Banach spaces we 
have examined so far are not reflexive except in trivial cases where they turn out to be 
finite-dimensional. We shall prove some cases of this assertion and present examples 
of reflexive Banach spaces in later sections. 

Usually we shall identify £ with x and thus regard X** as a superspace of X; 
reflexivity then means that X** = X. 


Exercises 


17. A linear functional f on a normed vector space X is bounded iff f—1({0}) is 
closed. (Use Exercise 12b.) 


18. Let X be a normed vector space. 
a. If M is a closed subspace and x € X \ M then M + Cz is closed. (Use 
Theorem 5.8a.) 
b. Every finite-dimensional subspace of X is closed. 
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19. Let X be an infinite-dimensional normed vector space. 
a. There is a sequence {x;} in X such that ||, || = 1 forall j and |x; —2,|| > 3 
for j # k. (Construct x; inductively, using Exercises 12b and 18.) 
b. X is not locally compact. 


20. If M is a finite-dimensional subspace of a normed vector space X, there is a 
closed subspace N such that M N N = {0} and M+ N = X. 


21. If X and Y are normed vector spaces, define a : X* x Y* — (X x Y)* b 
a(f,g)(z,y) = f(x) + g(y). Then a is an isomorphism which is isometric if we 
use the norm ||(z, y)|| = max(||zx||, Iyl) on X x Y, the corresponding operator norm 
on (X x Y)*, and the norm ||(f, 9)|| = [lFl| + Ilol] on X* x Y". 


22. Suppose that X and Y are normed vector spaces and T € L(X, Y4). 
a. Define T? : Y* — X* by Tİf = foT. Then Tt © L(Yy*,X*) and 
|Z" || = ||Z'||. T? is called the adjoint or transpose of T. 
b. Applying the construction in (a) twice, one obtains i RULE ca HO YI 
X and Y are identified with their natural images X and Yi in X** and Y**, = 
T X=T. 
c. T' is injective iff the range of T is dense in Y. 
d. If the range of T? is dense in X*, then T is injective; the converse is true if X 
is reflexive. 


23. Suppose that X is a Banach space. IfM is a closed subspace of X and N is a closed 
subspace of X*, let M? = {f € X* : fM = 0} and N+ = {z € X: f(x) = 0 for 
all f € N}. (Thus, if we identify X with its image in X**, N+ = N N X.) 
a. M? and N+ are closed subspaces of X* and X, respectively. 
b. (M°)+ = M and (N+)° D N. If X is reflexive, (N+)° = N. 
c. Let 7: X — X/M be the natural projection, and define a : (X/M)* — X* 
by a(f) = for. Then a is an isometric isomorphism from (X/M)* onto M°, 
where X/M has the quotient norm. 
d. Define 6 : X* — M* by (f) = f|M; then 8 induces a map 6 : X*/M° — 
M* as in Exercise 15, and 8 is an isometric isomorphism. 


24. Suppose that X is a Banach space. 
a. Let X. (X*f be the natural images of X, X* in X**, X***, and let xo — {Fe 
t F\X = 0}. Then (X*S°N X° = {0} and (X*+ X° = Xe". 
b. X is reflexive iff X* is reflexive. 


25. If X is a Banach space and X* is separable, then X is separable. (Let {fn} 
be a countable dense subset of X*. For each n choose z, € X with ||x,,|| = 1 and 
|fn(£n)| > 4l|fnl||. Then the linear combinations of {x,,}9° are dense in X.) Note: 
Separability of X does not imply separability of X*. 


26. Let X be a real vector space and let P be a subset of X such that (i) if x,y € P, 
then z +y €E P, (ii) if x € P and A > O, then Az € P, (ili) if z € P and —z € P, 
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then x = 0. (Example: If X is a space of real-valued functions, P can be the set of 
nonnegative functions in X.) 
a. The relation < defined by z < y iff y — x € P is a partial ordering on X. 
b. (The Krein Extension Theorem) Suppose that M is a subspace of X such 
that for each x € X there exists y € M with x < y. If f is a linear functional 
on M such that f(x) > 0 for x € MN P, there is a linear functional F on X 
such that F(x) > Oforz € P and F|M = f. (Consider p(x) = inf{ f(y): y € 
M and x < y}.) 


5.3 THE BAIRE CATEGORY THEOREM AND ITS CONSEQUENCES 


In this section we present an important theorem about complete metric spaces and 
use it to obtain some fundamental results concerning linear maps between Banach 
spaces. 


5.9 The Baire Category Theorem. Let X be a complete metric space. 
a. If{Un} is a sequence of open dense subsets of X, then (XY Un is dense in 


b. X is not a countable union of nowhere dense sets. 


Proof. For part (a), we must show that if W is a nonempty open set in X, 
then W intersects (};° Un. Since U, N W is open and nonempty, it contains a ball 
B(ro, Zo), and we can assume that 0 < ro < 1. For n > 0, we choose £n € X and 
Tn € (0, œo) inductively as follows: Having chosen x; and r; for 7 < n, we observe 
that Un O B(rn—1,2n—1) is open and nonempty, so we can choose £n, rn so that 
O < ry < 27" and B(rn, £n) C Un N B(rn-1,£n-1). Then if n,m > N, we see 
that £n, £m € B(ry, zy), and since r, — 0, the sequence {z,,} is Cauchy. As X 
is complete, z = lim x, exists. Since z, € B(ry, xy) forn > N we have 


TE B(ry, tn) C Un N B(r1, 21) CUNnNW 


for all N, and the proof is complete. = 
As for (b), if {En} is a sequence of nowhere dense sets in X, then {(E',)°} is a 
sequence of open dense sets. Since (\(E;,)° # Ø, we have JE, CUE, AX. m 


We remark that since the conclusions of the Baire category theorem are purely 
topological, it suffices for X to be homeomorphic to a complete metric space. For 
example, the theorem applies to X = (0,1), which is not complete with the usual 
metric but is homeomorphic to R. 

The name of this theorem comes from Baire’s terminology for sets: If X is a 
topological space, a set Æ C X is of the first category, according to Baire, if E is a 
countable union of nowhere dense sets; otherwise E is of the second category. Thus 
Baire’s theorem asserts that every complete metric space is of the second category 
in itself. A more modern and more descriptive synonym for “‘of the first category” is 
meager. The complement of a meager set is called residual. 
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The Baire category theorem is often used to prove existence results: One shows that 
objects having a certain property exist by showing that the set of objects not having 
the property (within a suitable complete metric space) is meager. For example, one 
can prove the existence of nowhere differentiable continuous functions in this way; 
see Exercise 42. 

We turn to the applications of the Baire category theorem in the theory of linear 
maps. Some terminology: If X and Y are topological spaces, a map f : X — Y is 
called open if f(U) is open in Y whenever U is open in X. If X and Y are metric 
spaces, this amounts to requiring that if B is a ball centered at x € X, then f(B) 
contains a ball centered at f(z). Specializing still further, if X and Y are normed 
linear spaces and f is linear, then f commutes with translations and dilations; it 
follows that f is open iff f(B) contains a ball centered at 0 in Y when B is the ball 
of radius 1 about 0 in X. 


5.10 The Open Mapping Theorem. Let X and Y be Banach spaces. If T € 
D(X, Y) is surjective, then T is open. 


Proof. Let B, denote the (open) ball of radius r about 0 in X. By the preceding 
remarks, it will suffice to show that T( Bı) contains a ball about 0 in Y. Since 
t= Eri Bn and T is surjective, we have Y = UZ T(Bn). But Y is complete and 
the map y + ny is a homeomorphism of Y that maps T (B) to T(B,,), so Baire’s 
theorem implies that T (Bı) cannot be nowhere dense. That is, there exist yo € Y and 
r > 0 such that the ball B(4r, yo) is contained in T (B1). Pick yı = Tz, € T(B1) 
such that ||y1 — yo|| < 2r; then B(2r, y1) C B(4r, yo) C T(B1), so if ||y|] < 2r, 











y = Tzi + (y — Yı) € T (2x1 ae Bı) C T(Bə2). 


Dividing both sides by 2, we conclude that there exists r > 0 such that if ||y|| < r 
then y € T(Bı). If we could replace T (B1) by T(B,), perhaps shrinking r at the 
same time, the proof would be complete; we now proceed to accomplish this. 

Since T commutes with dilations, it follows that if ||y|| < r27”, then y € 
T(Bo-n). Suppose ||y|| < r/2; we can find x; € B1/2 such that ||y — Tzı|| < r/4, 
and proceeding inductively, we can find z, € B2-n such that |y — 5°) Tx;|| < 
r27”-1, Since X is complete, by Theorem 5.1 the series S1 Ln converges, say to 
x. But then ||z|| < X7 27” = land y = Tz. In other words, T(B,) contains all y 
with ||y|| < 7/2, so we are done. E 








5.11 Corollary. If X and Y are Banach spaces and T € L(X,Y) is bijective, then 
T is an isomorphism; that is, T7} € L(Y, X). 


Proof. If T is bijective, continuity of T—! is equivalent to the openness of T. g 


For the next results we need some more terminology. If X and Y are normed 
vector spaces and T is a linear map from X to Y, we define the graph of T to be 


Lf) =\(eyetxYey=— Toh, 
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which is a subspace of X x Y. (From a strict set-theoretic point of view, of course, 
T and I(T) are identical; the distinction is a psychological one.) We say that T is 
closed if T (T) is a closed subspace of X x Y. Clearly, if T is continuous, then T is 
closed, and if X and Y are complete the converse is also true: 


5.12 The Closed Graph Theorem. If X and Y are Banach spaces and T : X — Y 
is a closed linear map, then T is bounded. 


Proof. Let 1 and 72 be the projections of T (T) onto X and Y, thatis, mı (x, Tx) = 
x and 72(z,Txr) = Tx. Obviously mı € L(T(T), X) and m2 € L(T(T), Y). Since 
X and Y are complete, so is X x Y, and hence so is I(T) since T is closed. The map 
Tı is a bijection from I(T) to X, so by Corollary 5.11, m7 l is bounded. But then 
T = m2 0 77" is bounded. E 


Continuity of a linear map T : X — Y means that if x, — x then Tzn —> Tr, 
whereas closedness means that if z,, — x and Tz, — y then y = Tz. Thus the 
significance of the closed graph theorem is that in verifying that Tz, — Tx when 
Tn —> X, we may assume that Tx, converges to something, and we need only to 
show that the limit is the right thing. This frequently saves a lot of trouble. 

The completeness of X and Y was used in a crucial way in proving the open 
mapping theorem and hence also in proving the closed graph theorem. In fact, the 
conclusions of both of these theorems may fail if either X or Y is incomplete; see 
Exercises 29-31. 

Our final result in this section is a theorem of almost magical power that allows 
one to deduce uniform estimates from pointwise estimates in certain situations. 


5.13 The Uniform Boundedness Principle. Suppose that X and Y are normed vec- 
tor spaces and A is a subset of L(X, Y). 


a. Ifsupre, ||T'x|| < coforallx insome nonmeager subset of X, thensupre g ||T|| < 
OO. 


b. IfX isa Banach space andsuppe, ||Tx\| < coforallx € X, then supre ||T|| < 
OO. 


Proof. Let 


En = {x E X : sup ||Tz|| < n} = e {xE X: ||Tz]| <n}. 
TEA TEA 


Then the £,,’s are closed, so under the hypothesis of (a) some En» must contain 
a nontrivial closed ball B(r, xo). But then Eo, D B(r,0), for if ||x|| < r, then 
XL — Xp E€ En and hence 








|x|] < |T (z — zo)|| + ||T zol] < 2n. 


In other words, ||T'x|| < 2n whenever T € A and ||z|| < r, so suppea ||T|| < 2n/r. 
This proves (a), and (b) follows by the Baire category theorem. E 
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Exercises 


27. There exist meager subsets of R whose complements have Lebesgue measure 
zero. 


28. The Baire category theorem remains true if X is assumed to be an LCH space 
rather than a complete metric space. (The proof is similar; the substitute for com- 
pleteness is Proposition 4.21.) 


29. Let Y = L!(u) where u is counting measure on N, and let X = {f € Y : 
Xi n|f(n)| < co}, equipped with the L! norm. 
a. X is a proper dense subspace of Y; hence X is not complete. 
b. Define T : X — Y by Tf(n) = nf(n). Then T is closed but not bounded. 
c. Let S = T-!. Then S : Y — X is bounded and surjective but not open. 


30. Let Y = C ([0, 1]) and X = C1((0, 1]), both equipped with the uniform norm. 
a. X is not complete. 
b. The map (d/dz) : X — Y is closed (see Exercise 9) but not bounded. 


31. Let X,Y be Banach spaces and let S : X — Y be an unbounded linear map (for 
the existence of which, see §5.6). Let T(S) be the graph of S, a subspace of X x Y. 
a. T(S) is not complete. 
b. Define T : X — T(S) by Tx = (x, Sx). Then T is closed but not bounded. 
c. T1 : T(S) — X is bounded and surjective but not open. 


32. Let ||- ||; and || - ||2 be norms on the vector space X such that || - ||; < || - |o. If 
X is complete with respect to both norms, then the norms are equivalent. 


33. There is no slowest rate of decay of the terms of an absolutely convergent series; 
that is, there is no sequence {an} of positive numbers such that X` a,|cn| < oo 
iff {cn } is bounded. (The set of bounded sequences is the space B(N) of bounded 
functions on N, and the set of absolutely summable sequences is L! (u) where p is 
counting measure on N. If such an {an } exists, consider T : B(N) — L! (u) defined 
by T f(n) = a,f(n). The set of f such that f(n) = 0 for all but finitely many n is 
dense in L! (u) but not in B(N).) 


34. With reference to Exercises 9 and 10, show that the inclusion map of L} (({0, 1}) 
into C*—'((0, 1]) is continuous (a) by using the closed graph theorem, and (b) by 
direct calculation. (This is to illustrate the use of the closed graph theorem as a 
labor-saving device.) 


35. Let X and Y be Banach spaces, T € L(X, 4), N(T) = {x : Tx = 0}, and 
M = range(T). Then X/N(T) is isomorphic to M iff M is closed. (See Exercise 
15.) 


36. Let X be a separable Banach space and let u be counting measure on N. Suppose 
that {x,,}$° is a countable dense subset of the unit ball of X, and define T : L! (u) 3 
XbyTf =>), f(n)rn. 

a. T is bounded. 
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b. T is surjective. 
c. X is isomorphic to a quotient space of L! (u). (Use Exercise 35.) 


37. Let X and Y be Banach spaces. If T : X — Y isa linear map such that f oT’ € X* 
for every f € Y*, then T is bounded. 


38. Let X and Y be Banach spaces, and let {T;,} be a sequence in L(X, Y) such that 
lim Taz exists for every x € X. Let Tx = lim Thx; then T € D(X, Y). 


39. Let X,Y, Z be Banach spaces and let B : X x Y — Z be a separately continuous 
bilinear map; that is, B(z,-) € L(Y, Z) for each x € X and B(-,y) € L(X, Z) for 
each y € Y. Then B is jointly continuous, that is, continuous from X x Y to Z. 
(Reduce the problem to proving that || B(x, y)|| < C||z|| ||y|| for some C > 0.) 


40. (The Principle of Condensation of Singularities) Let X and Y be Banach 
spaces and {T;, : j,k € N} C L(X,Y). Suppose that for each k there exists x € X 
such that sup{||T;,2|| : 7 E N} = oo. Then there is an x (indeed, a residual set of 
x’s) such that sup{||7j,2|| : 7 E€ N} = œ forall k. 


41. Let X be a vector space of countably infinite dimension (that is, every element 
is a finite linear combination of members of a countably infinite linearly independent 
set). There is no norm on X with respect to which X is complete. (Given a norm on 
X, apply Exercise 18b and the Baire category theorem.) 


42. Let En be the set ofall f € C((0, 1]) for which there exists xo € [0, 1] (depending 
on f) such that | f(z) — f (xo)| < n|x — zo| for all x € [0, 1]. 
a. En is nowhere dense in C ([0, 1]). (Any real f € C((0, 1]) can be uniformly 
approximated by a piecewise linear function g whose linear pieces, finite in 
number, have slope £2n. If |k — g||,, is sufficiently small, then h ¢ E,,.) 
b. The set of nowhere differentiable functions is residual in C ([0, 1]). 


5.4 TOPOLOGICAL VECTOR SPACES 


Itis frequently useful to consider topologies on vector spaces other than those defined 
by norms, the only crucial requirement being that the topology should be well behaved 
with respect to the vector operations. Precisely, a topological vector space is a vector 
space X over the field K (= R or C) which is endowed with a topology such that the 
maps (x,y) > x + y and (A, x) — Az are continuous from X x X and K x X to X. 
A topological vector space is called locally convex if there is a base for the topology 
consisting of convex sets (that is, sets A such that if x, y € Athentr+(1—t)yeEA 
for 0 < t < 1). Most topological vector spaces that arise in practice are locally 
convex and Hausdorff. 

The most common way of defining locally convex topologies on vector spaces is 
in terms of seminorms. Namely, if we are given a family of seminorms on X, the 
“balls” that they define can be used to generate a topology in the same way that the 
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balls defined by a norm generate the topology on a normed vector space. The precise 
result is as follows: 


5.14 Theorem. Let {pa}aca be a family of seminorms on the vector space X. If 
xr EX, œ €A, and e€ > 0, let 


Urae = {y = X : Paly — x) < e}, 


and let J be the topology generated by the sets Uzae. 
a. For each x € X, the finite intersections of the sets Urge (a € A, € > 0) form 
a neighborhood base at z. 
b. If(xijicr is a net in X, then zi > z iff Palzi — x) > 0 forall a € A. 
c. (X, T) isa locally convex topological vector space. 


Proof. (a)lfx € ni Ux aj€;> letj = €j—Pa(£—zx;). By the triangle inequality, 
we have x € n? Ue 165 ay Uz;a;«;- Thus the assertion follows from Proposition 
4.4. 

(b) In view of (a), it suffices to observe that pa (x; — x) — 0 iff (z;) is eventually 
in Uz. for every e > 0. 

(c) The continuity of the vector operations follows easily from Proposition 4.19 
and part (b). Indeed, if x; — x and y; — y, then 


Pa ((ti + ys) — (£ + y)) < Pati — £) + Palys — y) — 0, 
so £i + yi > x + y. If also A; — A, then eventually |A;| < C = |A| + 1, so 
Pa(Ài£i — AZ) < Pa (Ai (xi = x)) + Da (A; = Az) < Cpa(zi— z) +|Ai — Alpe (2), 


and it follows that A;2; — Ax. Moreover, the sets U,,, are convex, for if y, z € Uzae, 
then 


Po(x—[ty+(1—t)z]) < Paltz —ty)+pa((1—t)x+(1-t)z) < te+(1—t)e =e. 


The local convexity of the topology therefore follows from (a). E 


In this context there is an analogue of Proposition 5.2: 


5.15 Proposition. Suppose X and Y are vector spaces with topologies defined, 
respectively, by the families {Pa ac a and {qg}sep of seminorms, and T : X — Y 
is a linear map. Then T is continuous iff for each B € B there exist a,,...,@,€ A 
and C > 0 such that qg(Tx) < C YS Dai (a). 


Proof. If the latter condition holds and (z;) is a net converging to x € X, by 
Theorem 5.14b we have pa(x; — x) — 0 for all a, hence gg(Tx; — Tx) — 0 
for all G, hence Tzr; — Tx. By Proposition 4.19, T' is continuous. Conversely, 
if T is continuous, for every @ € B there is a neighborhood U of 0 in X such that 
qo(T x) < 1forz € U. By Theorem 5.14a we may assume that U = ie Usare Let 
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e = min(€;,...,€%); then gg(T'z) < 1 whenever pa, (x) < e for all j. Now, given 
x € X, there are two possibilities. If pa; (x) > 0 for some j, let y = ex/ By Pa, (2). 
Then pa, (y) < € for all 7, so 


k 


qa(T'x) = Se ae t)ga(Ty) < Ya 


1 


On the other hand, if pa, (x) = O for all j, then pa, (rx) = O for all j and all 
r > 0, hence rgg(Tz) = gg(T(rz)) < 1 for all r > 0, hence gg(Tx) = 0. Thus 


qe(Tz) < eTl ae Pa, (x) in this case too, and we are done. E 


The proof of the following proposition is left to the reader (Exercise 43). 


5.16 Proposition. Let X be a vector space equipped with the topology defined by a 
family {Pa tac 4 of seminorms. 
a. X is Hausdorff iff for each x # 0 there exists @ € A such that pa(x) Æ 0. 


b. If X is Hausdorff and A is countable, then X is metrizable with a translation- 
invariant metric (i.e., p(x, y) = p(x + z, y + z) forall x,y,z € X). 


If X has the topology defined by the seminorms {pa taca, by Proposition 5.15 a 
linear functional f on X is continuous iff |f (x)| < C a Pa; (x) for some C > 0 and 
Q1,...,Q@,% E A. Since a finite sum of seminorms is again a seminorm, the Hahn- 
Banach theorem guarantees the existence of lots of continuous linear functionals on 
X — enough to separate points, if X is Hausdorff. The set of all such functionals is 
denoted, as before, by X*. There are various ways of making X* into a topological 
vector space, but we shall not consider this question systematically. The simplest 
way is to impose the weakest topology that makes all the evaluation maps f +> f(z) 
(x € X) continuous, an idea that we shall discuss further below. 

In atopological vector space X the notion of Cauchy sequence or Cauchy net makes 
sense. Namely, a net (x;)ie7 in X is called Cauchy if the net (z; — 25) (i,j)erx1 
converges to zero. (Here I x I is directed in the usual way: (i,j) < (2’, 7’) iff 
i <2’ and j < 7’.) Naturally, X is called complete if every Cauchy net converges. 
Completéness is of most interest when X is first countable, in which case it is 
equivalent to the condition that every Cauchy sequence converges (Exercise 44). 
More particularly, if X is Hausdorff and its topology is defined by a countable family 
of seminorms, then this topology is first countable by Theorem 5.14a; indeed, it 
is given by a translation-invariant metric p by Proposition 5.16b, and a sequence 
is Cauchy according to the definition just given iff it is Cauchy with respect to 
p. A complete Hausdorff topological vector space whose topology is defined by a 
countable family of seminorms is called a Fréchet space. 

Let us now consider some interesting examples of topological vector spaces whose 
topologies are defined by families of seminorms rather than by single norms. We 
have already met a couple of them in previous chapters: 


e Let X be an LCH space. On C%, the topology of uniform convergence 
on compact sets is defined by the seminorms px(f) = sup,ex |f(z)| as 
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K ranges over compact subsets of X. If X is o-compact and {U,,} are 
as in Propositions 4.39 and 4.40, this topology is defined by the seminorms 
Pr(f) = sup, eg, |f (x)|. In this case, C* is easily seen to be complete, so it 
is a Fréchet space; by Proposition 4.38, so is C(X). 


e The space Lj,.(IR”), defined in §3.4, is a Fréchet space with the topology 
defined by the seminorms p;(f) = fe <p |f (x)| dx. (Completeness follows 


easily from the completeness of L1.) An obvious generalization of this con- 
struction yields a locally convex topological vector space Li,.(X, p) where X 
is any LCH space and p is a Borel measure on X that is finite on compact sets. 


Another class of topological vector spaces arises naturally in connection with 
the theory of differential equations. One often wishes to study the operator d/dz, or 
more complicated operators constructed from it, acting on various spaces of functions. 
Unfortunately, it is virtually impossible to define norms on most infinite-dimensional 
functions spaces so that d/dax becomes a bounded operator. Here is one precise result 
along these lines: There is no norm on the space C'~((0, 1]) ofinfinitely differentiable 
functions on (0, 1] with respect to which d/dz is bounded. Indeed, if fy(x) = e**, 
then (d/dx)f, = Af , so ||d/dx|| > |A| for all A no matter what norm is used on 
C™((0, 1]). 

In view of this difficulty, three courses of action are available. First, one can 
consider differentiation as an unbounded operator from X to Y where Y is a suitable 
Banach space and X is a dense subspace of Y, as in Exercise 30. Second, one can 
consider differentiation as a bounded linear map from one Banach space X to a 
different one Y, such as X = C*((0,1]) and Y = C*—1((0, 1]) in Exercise 9. Finally, 
one can consider differentiation as a continuous operator on a locally convex space X 
whose topology is not given by anorm. All of these points of view have their uses, but 
it is the last one that concerns us here. It is easy to construct families of seminorms 
on spaces of smooth functions such that differentiation becomes continuous almost 
by definition. For example, the seminorms p(f) = suppce<1 |f (x)| (k = 
0,1,2,...) make C%((0,1]) into a Fréchet space (the completeness is proved as 
in Exercise 9), and d/dz is continuous on this space by Proposition 5.15 since 
pk( f’) = pk+1( f). Other examples are considered in Exercise 45 and in Chapter 9. 

One of the most useful procedures for constructing topologies on vector spaces is 
by requiring the continuity of certain linear maps. Namely, suppose that X is a vector 
space, Y is a normed linear space, and ITa Vae A is a collection of linear maps from 
X to Y. Then the weak topology J generated by {Tx } makes X into a locally convex 
topological vector space. Indeed, J is just the topology J’ defined by the seminorms 
Pa(x) = ||Tazx|| according to Theorem 5.14. (J is generated by sets of the form 
{x : ||Tax — yol| < €} with yo € Y, whereas J’ is generated by sets of the form 
{x : ||Tax — Tazxol| < €} with zo € X. If the T,,’s are surjective, these are obviously 
the same; the general case is left as Exercise 46.) The topology on C'°((0, 1]) in 
the preceding paragraph is an example of this construction, with Y = C([0, 1]) and 
Tp f = f‘*). We now present some more. 

First, let X be a normed vector space. The weak topology generated by X* is 
known simply as the weak topology on X, and convergence with respect to this 
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topology is known as weak convergence. Thus, if (za) is a net in X, £a —> T 
weakly iff f(ta) — f(x) for all f € X*. When X is infinite-dimensional, the weak 
topology is always weaker than the norm topology; see Exercise 49. 

Next, let X be a normed vector space, X* its dual space. The weak topology 
on X* as defined above is the topology generated by X**; of more interest is the 
topology generated by X (considered as a subspace of X**), which is called the 
weak* topology (read “weak star topology”) on X*. X* is a space of functions on X, 
and the weak* topology is simply the topology of pointwise convergence: fa — f 
iff fa(z) — f(x) for all £ € X. The weak* topology is even weaker than the weak 
topology on X*; the two coincide precisely when X is reflexive. 

Finally, let X and Y be Banach spaces. The topology on D(X, Y) generated by 
the evaluation maps T +> Tz (x € X) is called the strong operator topology on 
L(X, Y), and the topology generated by the linear functionals T +> f(T'x) (x € X, 
f € Y*) is called the weak operator topology on L(X,Y). Again, these topologies 
are best understood in terms of convergence: Tẹ — T strongly iff Tax — Tz in the 
norm topology of Y for each x € X, whereas T, — T weakly iff Taz — Tz in the 
weak topology of 4 for each x € X. Thus the strong operator topology is stronger 
than the weak operator topology but weaker than the norm topology on L(X, Y4). 

The following result concerning strong convergence is almost trivial but extremely 
useful: 


5.17 Proposition. Suppose {Tn} C L(X, Y), sup,, ||Tn|| < œ, andT € L(X, Y). 
If ||T,2 — Tx|| — 0 forall x in a dense subset D of X, then Ta — T strongly. 


Proof. Let C = sup{||T||, Til, ||Zoll,...}. Given x € X and € > 0, choose 





x’ € Dsuch that ||x — 2’ || < €/3C. Ifn is large enough so that ||T,, 2’ —Tx’|| < €/3, 
we have 
\|Inz — Tz|| < |The — T,2'|| + |The — T2’|| + ||Tr' — T| 
< 2C||lx — 2’|| + że < €, 
so that Tnx —> Tr. g 


Our final result in this section is a compactness theorem that is one of the main 
reasons for the usefulness of the weak* topology on a dual space. The idea of the 
proof is similar to the techniques discussed in §4.8. 


5.18 Alaoglu’s Theorem. Jf X is a normed vector space, the closed unit ball B* = 
{f € X* : |f| < 1} in X* is compact in the weak* topology. 


Proof. For each z € X let Dz = {2 € C : |z| < ||z||}, and let D = J [zex De. 
Then D is compact by Tychonoff’s theorem. The elements of D are precisely those 
complex-valued functions ¢ on X such that |ġ(x)| < ||z|| for all z € X, and B* 
consists of those elements of D that are linear. Moreover, the relative topologies that 
B* inherits from the product topology on D and the weak* topology on X* both 
coincide with the topology of pointwise convergence, so it suffices to see that B* is 
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closed in D. But this is easy: If (fa) is a netin B* that converges to f € D, for any 
x,y € X and a,b € C we have 


f(ax + by) = lim fa (az + by) = limfa falz) + bfa(y)] = af(x) + bf(y), 


so that f € B*. E 


Warning: Alaoglu’s theorem does not imply that X* is locally compact in the 
weak* topology; see Exercise 49b. 


Exercises 
43. Prove Proposition 5.16. (For part (b), proceed as in Exercise 56d in 84.5.) 


44. If X is a first countable topological vector space and every Cauchy sequence in 
X converges, then every Cauchy net in X converges. 


45. The space C% (R) of all infinitely differentiable functions on R has a Fréchet 
space topology with respect to which fn — f iff f m "a uniformly on compact 
sets for all k > 0. 


46. If X is a vector space, Y a normed linear space, J the weak topology on X 
generated by a family of linear maps {Ta : X — Y}, and J’ the topology defined by 
the seminorms {z +> ||T,,z||}, then T = J”. 








47. Suppose that X and Y are Banach spaces. 
a. If {Ta} C D(X, Y) and Tan — T weakly (or strongly), then sup, ||Tn]|| < 
OO. 
b. Every weakly convergent sequence in X, and every weak*-convergent se- 
quence in X*, is bounded (with respect to the norm). 


48. Suppose that X is a Banach space. 
a. The norm-closed unit ball B = {x € X : ||z|| < 1} is also weakly closed. 
(Use Theorem 5.8d.) 
b. If & C X is bounded (with respect to the norm), so is its weak closure. 
c. If F C X* is bounded (with respect to the norm), so is its weak* closure. 
d. Every weak*-Cauchy sequence in X* converges. (Use Exercise 38.) 


49. Suppose that X is an infinite-dimensional Banach space. 
a. Every nonempty weakly open set in X, and every nonempty weak*-open set 
in X*, is unbounded (with respect to the norm). 
b. Every bounded subset of X is nowhere dense in the weak topology, and every 
bounded subset of X* is nowhere dense in the weak* topology. (Use Exercise 
A48b,c.) 
c. X is meager in itself with respect to the weak topology, and X* is meager in 
itself with respect to the weak* topology. 
d. The weak* topology on X* is not defined by any translation-invariant metric. 
(Use Exercise 48d.) 
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50. If X is a separable normed linear space, the weak* topology on the closed unit 
ball in X* is second countable and hence metrizable. (But cf. Exercise 49d.) 


51. A vector subspace of a normed vector space X is norm-closed iff it is weakly 
closed. (However, a norm-closed subspace of X* need not be weak*-closed unless 
X is reflexive; see Exercise 52d.) 


52. Let X be a Banach space and let f;,..., fn be linearly independent elements of 
Xx, 
a. Define T : X — C” by Tz = (fi(z),..., fn(x)). IEN = {zx : Tx = 0} and 
M is the linear span of f;,..., fn, then M = N° in the notation of Exercise 23 
and hence M* is isomorphic to (X/N)*. 
b. If F e X**, for any e > 0 there exists x € X such that F(f;) = f;(z) 
for 7 = 1,...,n and |/z|| < (1 + €)||F']. (FM can be identified with an 
element of (X /N)** and hence with an element of X/N since the latter is finite- 
dimensional.) 
c. If X is considered as a subspace of X**, the relative topology on X induced 
by the weak* topology on X** is the weak topology on X. 
d. In the weak* topology on X**, X is dense in X** and the closed unit ball in 
X is dense in the closed unit ball in X**. 
e. X is reflexive iff its closed unit ball is weakly compact. 


53. Suppose that X is a Banach space and {Tn}, {Sn} are sequences in D(X, X) 
such that T„ — T strongly and S,, —> S strongly. 
a. If {zn} C X and ||z, — z|| — 0, then ||Ta£n — Tx|| — 0. (Use Exercise 
47a.) 
b. TaSn — TS strongly. 


5.5 HILBERT SPACES 


The most important Banach spaces, and the ones on which the most refined analysis 
can be done, are the Hilbert spaces, which are a direct generalization of finite- 
dimensional Euclidean spaces. Before defining them, we need to introduce a few 
concepts. 

Let H be a complex vector space. An inner product (or scalar product) on 1 is 
a map (x,y) +> (x,y) from X x X — C such that: 


i. (ax + by, z) = a(x, z) + b(y, z) forallz,y,z E€ H and a,b € C. 
ii. (y, x£) = (x,y) forall vy E K. 
iii. (2,2) € (0, 0o) for all nonzero z € X. 


We observe that (1) and (11) imply that 


(x, ay + bz) = a(z,y) + blz, z) for all z,y,z € H and a,b € C. 
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(One can also define inner products on real vector spaces: (x, y) is then real, a and b 
are assumed real in (i), and (ii) becomes (y, x) = (x, y).) 

A complex vector space equipped with an inner product is called a pre-Hilbert 
space. If H is a pre-Hilbert space, for x € H we define 


læl| = v (z, 2). 


5.19 The Schwarz Inequality. |(x,y)}| < ||z|| ||y|| for all x,y E€ K, with equality 
iff x and y are linearly dependent. 


Proof. If (x, y) = 0, the result is obvious. If (x, y) Æ 0 (and in particular y Æ 0), 
let a = sgn(z, y) and z = ay, so that (x,z) = (z,x) = |(x,y}| and |]z|| = Ilyll. 
Then for t € R we have 

0< (z — tz, z= tz) = |z]? — 2|(z, y)| + ly? 


The expression on the right is a quadratic function of t whose absolute minimum 
occurs at t = |ly||~?|(z, y)|. Setting t equal to this value, we obtain 


0 < |lx = tzl? = el? = [yl He, y” 


with equality iff x — tz = x — aty = 0, from which the desired result is immediate. g 


5.20 Proposition. The function x > ||x|| is a norm on K. 


Proof. That ||x|| = 0 iff x = 0 and that ||Az|| = |A| ||z|| are obvious from the 
definition. As for the triangle inequality, we have 


le + yll? = (£ +y, z +y) = lel? +2Re(z,y) + yl’, 
so by the Schwarz inequality, 
Iz + yl? < lel? + 2e yll + lal? = (ell + ly’, 


as desired. H 


A pre-Hilbert space that is complete with respect to the norm ||z|| = y (z, <x) 
is called a Hilbert space. (One can also consider real Hilbert spaces with real 
inner products. However, Hilbert spaces are usually assumed to be complex unless 
otherwise specified.) 

Example: Let (X, M, u) be a measure space, and let L?(u) be the set of all 
measurable functions f : X — C such that f |f|*du < co (where, as usual, we 
identify two functions that are equal a.e.). From the inequality ab < 5(a? + b?), 
valid for all a,b > 0, we see that if f,g € L? (u) then | fg] < $(|f|? + |g|?), so that 
fg € L! (u). It follows easily that the formula 


if.) = | fod 
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defines an inner product on L?(). In fact, L? (u) is a Hilbert space for any measure 
u. We shall prove completeness in Theorem 6.6; for the present we shall take this 
result for granted. 

An important special case of this construction is obtained by taking u to be 
counting measure on (A, P(A)), where A is any nonempty set; in this situation 
L?() is usually denoted by 1?(A). Thus, 1?(A) is the set of functions f : A > C 
such that the sum 5°, <4 |f(a@)|* (as defined in §0.5) is finite. The completeness of 
l?(A) is rather easy to prove directly (Exercise 54). 

For the remainder of this section, H will denote a Hilbert space. 


5.21 Proposition. If x, — x and yn — y, then (£n, Yn) — (2, Y). 
Proof. By the Schwarz inequality, 
ia Un) z (x, y)| = ca — T, Yn) ag (Zs Yn — y)| 
< |len = 2] [yall + eI] llyn — yll, 


which tends to zero since ||y,, || — ||y||. E 


5.22 The Parallelogram Law. For all x,y € K, 
Iz + yl? + le = yl? = 2e? + Iyl’). 


(“The sum of the squares of the diagonals of a parallelogram is the sum of the squares 
of the four sides.” ) 


Proof. Add the two formulas ||x + y||? = |z|? + 2 Re(z, y) + llyl|?. E 


If x,y € X, we say that z is orthogonal to y and write x L y if (x,y) = 0. If 
E C K, we define 


E+ = {x € K: (2,y) =0 forall y € E}. 


It is immediate from Proposition 5.21 and the linearity of the inner product in its first 
argument that Æ+ is a closed subspace of H. 


5.23 The Pythagorean Theorem. Ifx1,..., £n E Handz; L x, forj # k, 
Sarei les? 
1 1 


Proof. (E zl? = (027,023) = 0) p(z, £k). The terms with k # j are 
all zero, leaving only X` (z;, z;) = >> ||z;l?. E 


5.24 Theorem. IfM is a closed subspace of K, then K = M M+; that is, each 
x € H can be expressed uniquelyas x = y+z where y € Mandz € M+. Moreover, 
y and z are the unique elements of M and M+ whose distance to x is minimal. 
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Proof. Given z € K, let 6 = inf{||z — y|| : y € M}, and let {yn } be a sequence 
in M such that ||£ — yn|| — 6. By the paralellogram law, 


2([lyn = 2|? + llym = zll) = lyn = Yl? + llyn + Ym — 22|’, 


so since $ (yn + Ym) € M, 


lyn — Yml? = 2llyn — zll? + 2llym — z|? — 413 (un + ym) — zl? 
< 2llyn — £? + 2llym — zl? — 46°. 


As m,n — oo this last quantity tends to zero, so {yn } is a Cauchy sequence. Let 
y = lim yn and z = x — y. Then y € M since M is closed, and ||z — y|| = ô. 

We claim that z € M+. Indeed, if u € M, after multiplying u by a nonzero scalar 
we may assume that (z, u) is real. Then the function 


f(t) = lz + tull? = llel? + 2t(z, u) + tull? 


is real for t € R, and is has a minimum (namely, 6?) at t = 0 because z + tu = 
x — (y — tu) and y — tu € M. Thus 2(z,u) = f’(0) = 0, so z € M+. Moreover, 
if z’ is another element of M+, by the Pythagorean theorem (since z — z = y € M) 
we have 


Iz — 2'||° = le — e ae 


with equality iff z = z’. The same reasoning shows that y is the unique element of 
M closest to x. 

Finally, if x = y’+ 2’ withy’ € Mand z’ € M+, then y- y’ = z'— z E€ MNM?+, 
so y — y’ and z’ — z are orthogonal to themselves and hence are zero. E 


If y € K, the Schwarz inequality shows that the formula fy(£) = (x, y) defines 
a bounded linear functional on H such that || fy || = ||y||. Thus, the map y > fy is 
a conjugate-linear isometry of H into H*. It is a fundamental fact that this map is 
surjective: 


5.25 Theorem. Jf f € H*, there is a unique y € KH such that f(x) = (x,y) forall 
TEX 


Proof. Uniqueness is easy: If (x,y) = (x,y’') for all z, by taking z = y — y’ 
we conclude that ||y — y' ||? = 0 and hence y = y’. If f is the zero functional, then 
obviously y = 0. Otherwise, let M = {x € H : f(x) = 0}. Then M is a proper 
closed subspace of X, so M+ # {0} by Theorem 5.24. Pick z € M+ with ||z|] = 1. 
If u = f(x)z — f(z)x then u € M, so 


0 = (u, 2) = f(z)? — f(2)(z, 2) = F(x) - (z, f(2)2). 





Hence f(x) = (x,y) where y = f(z)z. E 
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Thus, Hilbert spaces are reflexive in a very strong sense: Not only is K naturally 
isomorphic to H**, it is naturally isomorphic (via a conjugate-linear map) to H{*. 

A subset {ua}aca Of H is called orthonormal if ||ua|| = 1 for all œ and 
ua L ug whenever a # 8. If {x,}7° is a linearly independent sequence in K, there 
is a Standard inductive procedure, called the Gram-Schmidt process, for converting 
{£n } into an orthonormal sequence {un } such that the linear span of {£n }} coincides 
with the linear span of {un }¥ for all N. Namely, the first step is to setu = 21 /||21 |. 


N-1 
Having defined u1,...,un—1, we set vy = ZN — >, (LN, Unun. Then vy 
is nonzero because zy is not in the linear span of x1,...,2y_—, and hence of 
U1,-.-,UN—1, and (UN, Um) = (TN, Um) — (TN, Um) = O for all m < N. We can 


therefore take uy = vy/ |lvx ||. 
5.26 Bessel’s Inequality. If {ua}aca is an orthonormal set in K, then for any 


re. 
> |(z, ta)? Sar 


acA 
In particular, {a : (x, ua) Æ 0} is countable. 


Proof. It suffices to show that > „ep |(2, Ua) |? < ||x||? for any finite F c A. 
But 














2 
0 < E _ Ne Gye) tc 
ack 
2 

= |[a||? - 2Re (x, X. (z, ta)ta) + || D> (z, tajua 

ack ack 
= lz? -2 $ Kz, ua)? + $ Kz, ua)? 

aEF ack 
= [xl]? = Do Keta)”, 
ack 
where the Pythagorean theorem was used in the third line. E 


5.27 Theorem. [f{ua}aca isanorthonormal set in K, the following are equivalent: 
a. (Completeness) If (x, ua) = 0 for all a, then x = 0. 
b. (Parseval’s Identity) ||z||? = Dac 4 |(£, ua)|? for all x € K. 
c. For each x € H, £ = $ „c4 (T, Ua)Ua, where the sum on the right has only 
countably many nonzero terms and converges in the norm topology no matter 
how these terms are ordered. 


Proof. (a) implies (c): If x € K, let œi, &œ2,... be any enumeration of the a’s 
for which (x, ua) # 0. By Bessel’s inequality the series > |(x, ua, )|? converges, so 
by the Pythagorean theorem, 


[Daa 








9 m 
= Serie, — 0as m,n > ov. 
n 
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The series $ (£, Uo, )Ua, therefore converges since H is complete. If y = x — 
S(T, Ue, )Ua,, then clearly (y, Ua) = 0 for all a, so by (a), y = 0. 

(c) implies (b): With notation as above, as in the proof of Bessel’s inequality we 
have 


— Oasn— oo. 








n n 
2 
Izl? — D> Kz, ua)? = [le - $ (£, wa, Jita, 
1 1 


Finally, that (b) implies (a) is obvious. E 


An orthonormal set having the properties (a—c) in Theorem 5.27 is called an 
orthonormal basis for H. For example, let H = l? (A). For each a € A, define 
ea € I?(A) by eal) = 1 if 8 = a, eal) = 0 otherwise. The set {ea}aca is 
clearly orthonormal, and for any f € 1°(A) we have (f, ea) = f(a), from which it 
follows that {e,,} is an orthonormal basis. 


5.28 Proposition. Every Hilbert space has an orthonormal basis. 


Proof. A routine application of Zorn’s lemma shows that the collection of or- 
thonormal sets, ordered by inclusion, has a maximal element; and maximality is 
equivalent to property (a) in Theorem 5.27. E 


5.29 Proposition. A Hilbert space K is separable iff it has a countable orthonormal 
basis, in which case every orthonormal basis for H is countable. 


Proof. If {zn} is a countable dense set in K, by discarding recursively any x, 
that is in the linear span of z1,...,£n—1 we obtain a linearly independent sequence 
{Yn } whose linear span is dense in H. Application of the Gram-Schmidt process to 
{yn} yields an orthonormal sequence {w,, } whose linear span is dense in H and which 
is therefore a basis. Conversely, if {un } is a countable orthonormal basis, the finite 
linear combinations of the u,,’s with coefficients in a countable dense subset of C 
form a countable dense set in H. Moreover, if {va }ae, is another orthonormal basis, 
for each n the set A, = {a € A: (Un, Va) Æ 0} is countable. By completeness of 
{un}, A = UT An, so A is countable. E 


Most Hilbert spaces that arise in practice are separable. We discuss some examples 
in Exercises 60-62. 

If Hı and Ha are Hilbert spaces with inner products (-,-); and (-,-)o, a unitary 
map from H; to Ha is an invertible linear map U : Hı — Ho that preserves inner 
products: 

(Uz, Uy)2 = (x,y), for all z, y E€ Hı. 


By taking y = 2, we see that every unitary map is an isometry: ||Uz||2 = ||z|J1. 
Conversely, every surjective isometry is unitary (Exercise 55). Unitary maps are the 
true “isomorphisms” in the category of Hilbert spaces; they preserve not only the 
linear structure and the topology but also the norm and the inner product. From the 
point of view of this abstract structure, every Hilbert space looks like an I? space: 
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5.30 Proposition. Let {eœ}aca be an orthonormal basis for X. Then the corre- 
spondence x > & defined by (a) = (x, ua) is a unitary map from H to l? (A). 
Proof. The map x +> Z is clearly linear, and it is an isometry from K to 1? (A) 
by the Parseval identity ||z||? = © |Z(a)|?. If f € P(A) then > |f(a)|? < oo, 
so the Pythagorean theorem shows that the partial sums of the series $` f(a)ua (of 
which only countably many terms are nonzero) are Cauchy; hence x = 5} f(a)ug 
exists in H and & = f. By Exercise 55b, x > Z is unitary. E 


Exercises 
54. For any nonempty set A, l? (A) is complete. 


55. Let H be a Hilbert space. 
a. (The polarization identity) For any x,y € HX, 


(x,y) = 3 (lz + yll? + [le — yl? + illz + iyl? — illz — iyll?). 
(Completeness is not needed here.) 


b. If X’ is another Hilbert space, a linear map from H to 1’ is unitary iff it is 
isometric and surjective. 


56. If E is a subset of a Hilbert space K, (E+)+ is the smallest closed subspace of 
KH containing E. 


57. Suppose that H is a Hilbert space and T € L(K, H). 
a. There is a unique T* € L(K, H), called the adjoint of T, such that (Tz, y) = 
(x, T*y) for all x,y € H. (Cf. Exercise 22. We have T* = V~!T'V where V 
is the conjugate-linear isomorphism from H to H* in Theorem 5.25, (Vy)(z) = 




















(x£, Y).) 
b. |/Z* |] = EITT SAT 2 (aS + bT)" = a@S* bias de (Sir Ts", 
and 7** = T., 


c. Let Rand N denote range and nullspace; then R(T)+ = N(T*) and N(T)+ = 
R(T*). 
d. T is unitary iff T is invertible and T7! = T*. 


58. Let M be a closed subspace of the Hilbert space K, and for x € H let Pz be the 
element of M such that x — Px € M+ as in Theorem 5.24. 
a. P € L(K, H), and in the notation of Exercise 57 we have P* = P, P? =P, 
R(P) = M, and N(P) = M+. P is called the orthogonal projection onto M. 
b. Conversely, suppose that P € L(H,4H) satisfies P? = P* = P. Then R(P) 
is closed and P is the orthogonal projection onto R(P). 
c. If {ua} is an orthonormal basis for M, then Px = (z, ua) ua. 





59. Every closed convex set K in a Hilbert space has a unique element of minimal 
norm. (If 0 € K, the result is trivial; otherwise, adapt the proof of Theorem 5.24.) 


60. Let (X,M, p) be a measure space. If E € M, we identify L?(E, p) with the 
subspace of L?(X, pu) consisting of functions that vanish outside Æ. If {En} is 
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a disjoint sequence in M with X = UF En, then {L?(En, p)} is a sequence of 
mutually orthogonal subspaces of L? (X, p), and every f € L?(X, p) can be written 
uniquely as f = $>? fn (the series converging in norm) where fn € L?(En, p). If 
L? (En, u) is separable for every n, so is L?(X, p). 


61. Let (X,M, u) and (Y,N,v) be o-finite measure spaces such that L?(p) and 
L? (v) are separable. If {fm } and {gn } are orthonormal bases for L? (p) and L?(v) 
and hmn(£, Y) = fm(2)gn(y), then {hmn} is an orthonormal basis for L? (p x v). 


62. In this exercise the measure defining the L? spaces is Lebesgue measure. 
a. C((0, 1]) is dense in L?([0,1]). (Adapt the proof of Theorem 2.26.) 
b. The set of polynomials is dense in L?((0, 1]). 
c. L?([0,1]) is separable. 
d. L?(R) is separable. (Use Exercise 60.) 
e. L?(R”) is separable. (Use Exercise 61.) 


63. Let H be an infinite-dimensional Hilbert space. 
a. Every orthonormal sequence in H converges weakly to 0. 
b. The unit sphere S = {x : ||z|| = 1} is weakly dense in the unit ball 
B = {z: ||z|| < 1}. (In fact, every x € B is the weak limit of a sequence in S.) 


64. Let H be a separable infinite-dimensional Hilbert space with orthonormal basis 
{Un}P. 
a. For k € N, define Ly € L(H,H) by L(I anun) = `k Gntin—x. Then 
Ly — 0 in the strong operator topology but not in the norm topology. 
b. For k € N, define Ry € D(H, H) by Re (Sop anun) = DOP anUnk. Then 
Ry, — 0 in the weak operator topology but not in the strong operator topology. 
c. RkLk — 0 in the strong operator topology, but D,R, = I for all k. (Use 
Exercise 53b.) 


65. 1?(A) is unitarily isomorphic to /?(B) iff card(A) = card( B). 


66. Let M be a closed subspace of L?([0, 1], m) that is contained in C((0, 1}). 
a. There exists C > 0 such that || f || < C'I|f || z2 forall f € M. (Use the closed 
graph theorem.) 
b. For each x € [0,1] there exists ge E M such that f(x) = (f, gx) for all 
f € M, and ||gz||r2 < C. 
c. The dimension of M is at most C?. (Hint: If { f; } is an orthonormal sequence 
in M, Y | f;(£)|? < C? for all z € [0, 1].) 


67. (The Mean Ergodic Theorem). Let U be a unitary operator on the Hilbert 
space H, M = {x : Ux = z}, P the orthogonal projection onto M (Exercise 58), 
and Sn = n7! So UJ. Then S,, — P in the strong operator topology. (If £ € M, 
then S,2 = xz; if x = y — Uy for some y, then S,2 — 0. By Exercise 57d, 
M = {x : U*x = x}. Apply Exercise 57c with T = I — U.) 
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5.6 NOTES AND REFERENCES 


Functional analysis is a vast subject of which we have barely scratched the surface 
here. For the reader who wishes to learn more, Reed and Simon [112] and Rudin 
[126] are good places to start; one should also familiarize oneself with the treatises 
of Dunford and Schwartz [35] and Yosida [163]. 

Functional analysis has roots in a number of classical problems, particularly in 
the theory of differential and integral equations. The study of particular infinite- 
dimensional function spaces began in earnest around 1907 with work of F. Riesz, 
Fréchet, Schmidt, Helly, and others, and the notion of an abstract normed vector space 
appeared in papers by several authors about 1920. The research of the succeeding 
decade culminated in Banach’s classic book [9], which marked the emergence of 
functional analysis as an established discipline. Detailed historical accounts can be 
found in Dieudonné [33] and in the notes in Dunford and Schwartz [35]. 


85.1: The integral for vector-valued functions developed in Exercise 16 is called 
the Bochner integral. The hypothesis that Y is separable can be dropped, but the 
functions in Li must then be required to have separable range (after modification on 
a null set). A more detailed account can be found in Cohn [27] or Yosida [163]. 

Another approach to vector-valued integrals is as follows. Suppose that (X, M, p) 
is a measure space and Y is a topological vector space on which the continuous linear 
functionals separate points. A function f : X — Y is called weakly integrable if (i) 
po f € L*(p) for all d € Y*, and (ii) there exists y € Y (necessarily unique) such 
that [do fdu = (y) forall $ € Y*. In this case we set f fdu = y. If Y is a 
separable Banach space, this notion of integral coincides with the Bochner integral. 
See Yosida [163] and Rudin [126]. 


85.3: The open mapping and closed graph theorems are due to Banach [9]. See 
Grabiner [58] for an interesting comment on the relation between the proofs of the 
open mapping theorem and the Tietze extension theorem. 

The uniform boundedness principle, as we have stated it, is due to Banach and 
Steinhaus [10]; however, the second part of the theorem — that if X is a Banach 
space and suprea ||T'z|| < 00 for all £ € X, then suprea ||T|| < oo — had been 
proved previously by what Dieudonné [33] calls the “method of the gliding hump.” 
This rather pretty (and elementary) argument has been largely neglected in recent 
years, but a modern exposition of it can be found in Hennefeld [71]. 

It is simple to construct examples of unbounded linear maps T : X — Y from one 
normed vector space to another when X is incomplete (see Exercises 29 and 30), but 
virtually impossible to do so when X is complete without using the axiom of choice. 
The standard method is as follows: Start with an unbounded T : Xo — Y where 
Xo is incomplete, and let X be the completion of Xo. Pick a basis {ua }aca for Xo 
(meaning that every x € Xo is a finite linear combination of the uq’s), and extend it 
to a basis {ua}aep (B D A) for X. (This is where the axiom of choice comes in.) 
Let M be the linear span of {ua }ac B\ A, SO that each x € X can be written uniquely 
as © = Xp +2, where x; € Xp and zı € M. Then T can be extended to X by setting 
T (zo + z1) = Tro. 
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85.4: Treves [150] contains a readable account of the general theory of topolog- 
ical vector spaces, with many concrete examples. 

Alaoglu’s theorem, which was first announced in Alaoglu [3] and proved in detail 
in Alaoglu [4], supersedes a number of earlier results dealing with special cases. It 
was discovered independently by Bourbaki [19]. 


§5.5: The space envisaged by Hilbert himself was 12 (N); the notion of an abstract 
Hilbert space was introduced by von Neumann [154] in his work on the mathematics 
of quantum mechanics. Theorem 5.25 is originally due to F. Riesz [115] in the setting 
of L? spaces. It is one of several representation theorems for linear functionals on 
various spaces that bear his name, the others being Theorems 6.15, 7.2, and 7.17. To 
avoid confusion, we reserve the name “Riesz representation theorem” for the latter 
two, which are closely related. 

In the literature of quantum physics, scalar products are customarily denoted by 
(x|y) and are taken to be linear in the second variable and conjugate-linear in the 
first. 








LP Spaces 


LP spaces are a class of Banach spaces of functions whose norms are defined in terms 
of integrals and which generalize the L! spaces discussed in Chapter 2. They furnish 
interesting examples of the general theory of Chapter 5 and play a central role in 
modern analysis. 


6.1 BASIC THEORY OF L? SPACES 


In this chapter we shall be working on a fixed measure space (X,M, yw). If f isa 
measurable function on X and 0 < p < œ, we define 


1/p 
Ifl = | fise ay 


(allowing the possibility that || f||, = oo), and we define 
L?(X,M, u) = {f : X — C: f is measurable and ||f||p < oo}. 


We abbreviate LP (X, M, u) by L’ (u), LP (X), or simply L? when this will cause no 
confusion. As we have done with L!, we consider two functions to define the same 
element of LP when they are equal almost everywhere. 

If A is any nonempty set, we define /?(A) to be LP (u) where u is counting measure 
on (A, P(A)), and we denote lP (N) simply by /?. 

LP’ is a vector space, for if f,g € LP, then 


If +g? < [2max(Ifl,lgl)]” < 2 (1fP + lol), 181 
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so that f + g € LP. Our notation suggests that || - ||, is a norm on L?. Indeed, it is 
obvious that || f||, = 0 iff f = 0 a.e. and ||cf||, = |el || f|lp. so the only question is 
the triangle inequality. It turns out that the latter is valid precisely when p > 1, so 
our attention will be focused almost exclusively on this case. 

Before proceeding further, however, let us see why the triangle inequality fails for 
p < 1. Suppose a > 0, b > 0, and 0 < p < 1. Fort > Owe have t?-! > (a+ t)P™t, 
and by integrating from 0 to b we obtain a? + bP > (a + b)?. Thus, if E and 
F are disjoint sets of positive finite measure in X and we set a = p(E)!/? and 
b = u(F)!/P, we see that 


Ixe + XFllp = (a? + BP)? > a+b = |ixellp + llxrllp- 


The cornerstone of the theory of LP spaces is Hélder’s inequality, which we now 
derive. 


6.1 Lemma. Jfa > 0, b> 0, and0 < A< 1, then 
ab * < Na Nb: 
with equality iff a = b. 


Proof. The result is obvious if b = 0; otherwise, dividing both sides by b and 
setting t = a/b, we are reduced to showing that tô < At + (1 — A) with equality iff 
t = 1. But by elementary calculus, t* — At is strictly increasing for t < 1 and strictly 
decreasing for t > 1, so its maximum value, namely 1 — A, occurs at t = 1. E 


6.2 Hölder’s Inequality. Suppose 1 < p < œ and p7! + q7! = 1 (that is, q = 
p/(p — 1)). If f and g are measurable functions on X, then 


(6.3) lfolla < [fllpllglle: 


In particular, if f € L? and g € L4, then fg € L}, and in this case equality holds in 
(6.3) iffa|f|? = Blg|? a.e. for some constants a, 3 with aß # 0. 


Proof. The result is trivial if || f ||, = 0 or ||g||, = 0 (since then f = 0 or g = 0 
a.e.), or if || f||p = œ or ||g||, = oo. Moreover, we observe that if (6.3) holds for a 
particular f and g, then it also holds for all scalar multiples of f and g, for if f and g 
are replaced by af and bg, both sides of (6.3) change by a factor of |ab|. It therefore 
suffices to prove that (6.3) holds when || f||p = ||g||_ = 1 with equality iff | f |P = |g|? 
a.e. To this end, we apply Lemma 6.1 with a = |f(z)|?, b = |g(x)|%, and A = p=! 
to obtain 
(6.4) If(x)g9(x)| < pF (x)? + a" lole). 


Integration of both sides yields 


Ilfglla <p | Ife + a [ol =p ig Sisia 


Equality holds here iff it holds a.e. in (6.4), and by Lemma 6.1 this happens precisely 
when |f|P = |g|? a.e. E 
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The condition p~! + q7! = 1 occurring in Hélder’s inequality turns up frequently 
in LP theory. If 1 < p < oo, the number q = p/(p — 1) such that p-! + q7} = 1 is 
called the conjugate exponent to p. 


6.5 Minkowski’s Inequality. If1 < p < œ and f,g € LP, then 


If + gll < lfllp + llgllp- 
Proof. The result is obvious if p = 1 orif f + g = 0a.e. Otherwise, we observe 
that 
If + 9P < (IFI + lgl) lf + gP? 
and apply Hölder’s inequality, noting that (p — 1)q = p when q is the conjugate 
exponent to p: 


J E+ IP < NENIE + 9? lle + llallpll Lf + oP- 
1/q 
= (Ilflly + llally) ( l +a) | 


Therefore, 


1—(1/q) 
roe | [ire at] < Iflle + llally 


This result shows that, for p > 1, LP is a normed vector space. More is true: 
6.6 Theorem. For 1 < p < œ, LP is a Banach space. 


Proof. We use Theorem 5.1. Suppose { fxg} C L? and DOT || fxl|p = B < œ. 
Let Gn = X`} |fk| and G = XF |fk]. Then ||Grllo < X1 llfellp < B for all n, 
so by the monotone convergence theorem, f G? = lim f GP < BP. Hence G € L?, 
and in particular G(x) < oo a.e., which implies that the series $7" fg converges 
a.e. Denoting its sum by F, we have |F| < G and hence F € LP; moreover, 


IF — 57 fel? < (2G)? € Lt, so by the dominated convergence theorem, 
2 p Ae 
-Ea J-E 0 
1 Š 1 
Thus the series Sor fk converges in the LP norm. a 


6.7 Proposition. For 1 < p < œ, the set of simple functions f = X`} ajXE,;, where 
(Ej) < œ forall j, is dense in DP. 


Proof. Clearly such functions are in L?. If f € LP, choose a sequence {fn} 
of simple functions such that f, — f a.e. and |f,| < |f|, according to Theorem 
2.10. Then fn € L? and | fa — f |P < 2P|f|P € L', so by the dominated convergence 
theorem, || fn — fllo — 0. Moreover, if f, = $ aj XE; Where the Æ; are disjoint and 
the a; are nonzero, we must have u( E;) < oo since }> |a;|Pu(E;) = f |fn|? < oo. 
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To complete the picture of LP spaces, we introduce a space corresponding to the 
limiting value p = oo. If f is a measurable function on X, we define 


Ifilo = inf {a > >0:u({x:|f(x)| >a}) = o}, 


with the convention that inf @ = oo. We observe that the infimum is actually attained, 
for 


{x:|f(r)| >a} = Ute: t)|>a+n"}, 


and if the sets on the right are null, so is the one on the left. || f||,, is called the 
essential supremum of |f| and is sometimes written 


If lloo = ess supzex|f(2)]. 


We now define 
L® = L®(X,M, u) = {f : X >C: f is measurable and || f||.. < co}, 


with the usual convention that two functions that are equal a.e. define the same 
element of L°°. Thus f € L” iff there is a bounded measurable function g such that 
f = g a.e.; we can take g = fxe where E = {x : |f (x)| < || fll}. 

Two remarks: First, for fixed X and M, L® (X, M, p) depends on p only insofar 
as u determines which sets have measure zero; if u and v are mutually absolutely 
continuous, then L®(p) = L® (v). Second, if p is not semifinite, for some purposes 
it is appropriate to adopt a slightly different definition of L°. This point will be 
explored in Exercises 23-25. 

The results we have proved for 1 < p < oo extend easily to the case p = ow, as 
follows: 


6.8 Theorem. 
a. If f and g are measurable functions on X, then || fgl < ||fllillgllo. If 
f € L’ andg € L™, 1 = Wfllillglleo if l9(x)| = Igloo a.e. on the set 
where f(x) # 0. 
b. || + |\|oo isa norm on L”. 


c. ||fn — fllo — 0 iff there exists E € M such that p(E°) = 0 and fa — f 
uniformly on E. 





d. L® is a Banach space. 
e. The simple functions are dense in L”. 


The proof is left to the reader (Exercise 2). 

In view of Theorem 6.8a and the formal equality 17! + oo—+ = 1, it is natural to 
regard 1 and œo as conjugate exponents of each other, and we do so henceforth. 

Theorem 6.8c shows that || - ||,, is closely related to, but usually not identical 
with, the uniform norm || - ||,,. However, if we are dealing with Lebesgue measure, or 
more generally any Borel measure that assigns positive values to all open sets, then 
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Ifilo = ||fllu whenever f is continuous, since {x : |f(x)| > a} is open. In this 
situation we may use the notations ||f||,, and ||f||,, interchangeably, and we may 
regard the space of bounded continuous functions as a (closed!) subspace of L. 

In general we have LP ¢ L? for all p Æ q; to see what is at issue, it is instructive 
to consider the following simple examples on (0, co) with Lebesgue measure. Let 
falx) = 2~°, where a > 0. Elementary calculus shows that fax(o,1) € LP iff 
p < a™t, and faX(1,œ%) € L’ iff p > a~'. Thus we see two reasons why a function 
f may fail to be in LP: either |f|? blows up too rapidly near some point, or it 
fails to decay sufficiently rapidly at infinity. In the first situation the behavior of 
|f|? becomes worse as p increases, while in the second it becomes better. In other 
words, if p < q, functions in L? can be locally more singular than functions in L4, 
whereas functions in L4 can be globally more spread out than functions in LP. These 
somewhat imprecisely expressed ideas are actually a rather accurate guide to the 
general situation, concerning which we now give four precise results. The last two 
show that inclusions LP C L? can be obtained under conditions on the measure space 
that disallow one of the types of bad behavior described above; for a more general 
result, see Exercise 5. 


6.9 Proposition. IfO < p <q <r < œ, then L C LP + L"; that is, each f € L9 
is the sum of a function in L? and a function in L". 


Proof. If f € L3, let E = {x : |f(x)| > 1} and set g = fxg and h = fxe. 
Then |g? = |f[Pxe < |fl?xe, so g € LP, and |h|" = |f| Xe < |f|1XE., so 
h € L". (For r = œ, obviously ||hllo < 1.) E 


6.10 Proposition. If 0 < p < q < r < œ, then LP N L” C Li and || fll < 
IRITI S where X € (0,1) is defined by 


—1 1 


-r 
q7! = Ap! + (1 — ArT}, that is, \ = eer 


Proof. Ifr =o, we have |f|? < || f||25?|f|? and A = p/q, so 


esI Ee /? = Wf lpi lls. 


If r < œœ, we use Holder’s inequality, taking the pair of conjugate exponents to be 
p/Aqandr/(1 — A)q: 


J fl? = J PAAIE < MAA elI a-a 


PNE (1—A)q/r 
j J i r| | If r| sA A, 


Taking qth roots, we are done. E 
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6.11 Proposition. Zf A is any set and O < p < q < œ, then P(A) C 19(A) and 
If lla < Ilp- 


Proof. Obviously ||f||2, = supa |f(a)|P < doa |f(a)|?, so that || fll < II fllp- 
The case q < oo then follows from Proposition 6.10: if A = p/q, 


fle < WARIA < Ilo 


6.12 Proposition. If u(X) < œ and 0 < p < q < œ, then LP (u) D L(y) and 
Ifl < NFIA. 


Proof. If q = oo, this is obvious: 


Isle = fP < MAB f1 = NAR. 


Ifq < oo, we use Hölder’s inequality with the conjugate exponents q/p and q/(q—p): 


Iflg= f MP1s IFP Iasl Ilara- = IAR. 


We conclude this section with a few remarks about the significance of the LP 
spaces. The three most obviously important ones are L1, L?, and Lœ. With L! we 
are already familiar; L? is special because it is a Hilbert space; and the topology 
on L% is closely related to the topology of uniform convergence. Unfortunately, 
L! and L® are pathological in many respects, and it is more fruitful to deal with 
the intermediate LP spaces. One manifestation of this is the duality theory in 86.2; 
another is the fact that many operators of interest in Fourier analysis and differential 
equations are bounded on L? for 1 < p < oo but not on L! or L. (Some examples 
are mentioned in 89.4.) 


Exercises 


1. When does equality hold in Minkowski’s inequality? (The answer is different 
for p = 1 and for 1 < p < co. What about p = 00?) 


2. Prove Theorem 6.8. 


3. If1 <p<r<_oo, LPN L" isa Banach space with norm || f|| = ||f|lp + If llr, 
and if p < q < r, the inclusion map LP N L" — L? is continuous. 


4. If1<p<r < oo, L? + Lisa Banach space with norm ||f|| = inf{||g||p + 
Iklil- : f = g+h}, and if p < q < r, the inclusion map L? — LP + L" is continuous. 


5. Suppose 0 < p < q < œ. Then LP ¢ L? iff X contains sets of arbitrarily small 
positive measure, and LI ¢ LP iff X contains sets of arbitrarily large finite measure. 
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(For the “if”? implication: In the first case there is a disjoint sequence {En} with 
0 < u(En) < 27”, and in the second case there is a disjoint sequence {E,,} with 
1 < u(En) < oo. Consider f = X` anXeE, for suitable constants an.) What about 
the case g = co? 


6. Suppose 0 < po < pı < co. Find examples of functions f on (0,00) (with 
Lebesgue measure), such that f € LP iff (a) po < p < pı, (b) po < p < p, (c) 
p = po. (Consider functions of the form f(x) = x~°| log z|°.) 


7. If f e LP N L@& for some p < œ, so that f € L9 for all q > p, then 
I Flloo = limg—oo || f Ila: 


8. Suppose u(X) = 1 and f € L’ for some p > 0, so that f € LI for0 < q < p. 
a. log ||fllq > flog |f|. (Use Exercise 42d in §3.5, with F(t) = et.) 
b. (SIFI? — 1)/q 2 log |Ifllq, and (J |f|? — 1)/¢ > flog |f| as g > 0. 
c. limg—o || fllg = exp(J log |f|). 


9. Suppose 1 < p < ov. If ||fn — fllp — 0, then fn — f in measure, and hence 
some subsequence converges to f a.e. On the other hand, if fh — f in measure and 
lfn| < g € L for all n, then || fn — fp > 0. 


10. Suppose 1 < p < œ. If fn, f € L? and fn —> f a.e., then || fn — f|lp — 0 iff 
l fnll > || fllp. (Use Exercise 20 in §2.3.) 


11. If f is a measurable function on X, define the essential range Rẹ of f to be the 
set of all z € C such that {x : | f(x) — z| < e} has positive measure for all € > 0. 
a. Rẹ is closed. 
b. If f € L”, then Ry is compact and || f||,, = max{|z| : z € Rf}. 


12. If p # 2, the LP norm does not arise from an inner product on L?, except in 
trivial cases when dim( LP) < 1. (Show that the parallelogram law fails.) 


13. L?(IR”,m) is separable for 1 < p < oo. However, L° (R”, m) is not separable. 
(There is an uncountable set F C L® such that || f — glloœ > 1 for all f, g € F with 


f#9.) g 


14. If g € L”, the operator T defined by T f = fg is bounded on L? for1 < p < œœ. 
Its operator norm is at most ||g||,.., with equality if u is semifinite. 


15. (The Vitali Convergence Theorem) Suppose 1 < p < œ and {fn} C DP. 
In order for { fn } to be Cauchy in the LP norm it is necessary and sufficient for the 
following three conditions to hold: (i) {fn} is Cauchy in measure; (ii) the sequence 
{| fn|?} is uniformly integrable (see Exercise 11 in §3.2); and (iii) for every € > 0 
there exists E C X such that (E) < oo and fpe |fn|P < € for all n. (To prove 
the sufficiency: Given € > 0, let E be as in (iii), and let Amn = {2 E E : 
|fm(£) — fr(x)| > €}. Then the integrals of | fn — fm|? over E \ Amn, Amn, and 
E* are small when m and n are large — for three different reasons.) 


16. If0 < p < 1, the formula p(f,g) = f |f — g|? defines a metric on L? that makes 
LP into a complete topological vector space. (The proof of Theorem 6.6 still works 
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for p < 1 if || f|lp is replaced by f | f|?, as it uses only the triangle inequality and not 
the homogeneity of the norm.) 


6.2 THE DUAL OF L? 


Suppose that p and q are conjugate exponents. Hölder’s inequality shows that each 
g € L9 defines a bounded linear functional ¢, on L? by 


by (f) = f to, 


and the operator norm of ¢, is at most ||g||q. (If p = 2 and we are thinking of L? as 
a Hilbert space, it is more appropriate to define ¢,(f) = f fg. The same convention 
can be used for p # 2 without changing the results below in an essential way.) In 
fact, the map g — ¢, is almost always an isometry from L% into (L”)*. 


6.13 Proposition. Suppose that p and q are conjugate exponents and 1 < q < ov. 
Ifg € L9, then 


lole = Heal = sup $| f fo] : itle = 1b. 


If u is semifinite, this result holds also for q = œ. 


Proof. Hölder’s inequality says that ||, || < ||g||,, and equality is trivial if g = 0 
(a.e.). If g Æ Oand q < œ, let 





ja ll en 
lglg 
Then 
yg = Lil = L E 
oll? PPS lale 
SO 





q 
leol f 49 = JI = jay 


lalla 


(If q = 1, then f = 58609, || fllo = 1, and f fg = |Igll1.) If q = œ, fore > O let 
A = {z : |g(z)| > |lgllo —e}. Then (A) > 0, so if pz is semifinite there exists 
Bc Awith0 < p(B) < œ. Let f = u(B)` tx BSET; then || f||1 = 1, so 


1 
ell > / fo= [ eS Nall sxe. 


Since € is arbitrary, | 








byll = ales a 
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Conversely, if f — f fg is a bounded linear functional on LP, then g € L? in 
almost all cases. In fact, we have the following stronger result. 


6.14 Theorem. Let p and q be conjugate exponents. Suppose that g is a measurable 
function on X such that fg € L for all f in the space X of simple functions that 
vanish outside a set of finite measure, and the quantity 


My(o) = sup {| [ fo|: f€ 2wa fly =1} 


is finite. Also, suppose either that S, = {x : g(x) # 0} is o-finite or that p is 
semifinite. Then g € L and M,(g) = |\g\lq- 


Proof. First, we remark that if f is a bounded measurable function that vanishes 
outside a set E of finite measure and ||f||p = 1, then | f fg| < M,(g). Indeed, 
by Theorem 2.10 there is a sequence {fn } of simple functions such that | fn] < |f] 
(in particular, f» vanishes outside E) and fn — f a.e. Since |f,| < ||flloxe and 
xeg E L', by the dominated convergence theorem we have | f fg| = lim | f frg| < 
M, (9). 

Now suppose that q < oo. We may assume that S, is o-finite, as this condition 
automatically holds when p is semifinite; see Exercise 17. Let { En } be an increasing 
sequence of sets of finite measure such that Sy = J? En. Let {¢,,} be a sequence 
of simple functions such that n — g pointwise and |¢,,| < |g], and let gn = ỌnXEn- 
Then g» — g pointwise, |gn| < |g|, and gn vanishes outside En. Let 








f = l9n|? ‘Seng 
PAra 


Then as in the proof of Proposition 6.13 we have || fn||p = 1, and by Fatou’s lemma, 


lglg < lim inf ||gn||, = liminf [| fag 
< liminf f fng = limin | fag < M4(9). 


(For the last estimate we used the remark at the beginning of the proof.) On the other 
hand, Hölder’s inequality gives M,(g) < ||g||q, so the proof is complete for the case 
q < oo. 

Now suppose q = oo. Given € > 0, let A = {zx : |g(x)| > Mæ(g) +€}. If 
u( A) were positive, we could choose B C A with 0 < p(B) < on (either because 
u is semifinite or because A C S,). Setting f = p(B) 1XxB5gmg, we would then 
have || fl: = 1, and f fg = w(B)~* fp lgl > Moo(g) +. But this is impossible by 
the remark at the beginning of the proof. Hence ||g||oo < M.o(g), and the reverse 


inequality is obvious. E 


The last and deepest part of the description of (LP)* is the fact that the map 
g — $q is, in almost all cases, a surjection. 
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6.15 Theorem. Let p and q be conjugate exponents. If 1 < p < œ, for each 
p € (LP)* there exists g € L? such that ¢(f) = f fg forall f € LP, and hence L9 
is isometrically isomorphic to (LP )*. The same conclusion holds for p = 1 provided 
u is o-finite. 


Proof. First let us suppose that p is finite, so that all simple functions are in 
LP. If ọ € (L”)* and E is a measurable set, let v(E) = $(ve). For any disjoint 
sequence {E;}, if E = [JẸ E; we have xe = }27 xz, where the series converges 
in the L? norm: 


n OO 
[xz 7 X XE; = > XE; 
1 P n+1 


(It is at this point that we need the assumption that p < oo.) Hence, since ¢ is linear 
and continuous, 














oy 1/p 
=p E) — 0asn —> oo. 
P n+1 


v(E) =) (xz) = > v(E;), 


so that v is a complex measure. Also, if (E) = 0, then xg = 0 as an element 
of LP, so v( E) = 0; that is, v < u. By the Radon-Nikodym theorem there exists 
g € L’ (p) such that 6(xz) = v(E) = fpg dp forall E and hence ¢(f) = f fg dp 
for all simple functions f. Moreover, | f fg] < lloll |fllp, so g E LY by Theorem 
6.14. Once we know this, it follows from Proposition 6.7 that ọ( f) = f fg for all 
fel. 

Now suppose that u is o-finite. Let {En } be an increasing sequence of sets such 
that 0 < p(E,) < œ and X = [JY En, and let us agree to identify L? (En) and 
L4(E,,) with the subspaces of L’ (X) and L7(X) consisting of functions that vanish 
outside En. The preceding argument shows that for each n there exists gn € LI (En) 
such that (f) = J fgn for all f € L?(En), and |lgnilg = lol (En) < Illl. 
The function gn is unique modulo alterations on nullsets, so gn = gm a.e. on En for 
n < m, and we can define g a.e. on X by setting g = gn on En. By the monotone 
convergence theorem, ||g||~ = lim |lgn|l¢ < |||, so g E€ LY. Moreover, if f € DP, 
then by the dominated convergence theorem, fyz, — f in the LP norm and hence 
(f) =limd(fxe,) =limf, fo = J fo. 

Finally, suppose that p is arbitrary and p > 1, so that q < oo. As above, for each 
o-finite set E C X there is an a.e.-unique gg € LI (E) such that ¢( f) = f fgg for 
all f € LP (E) and ||gellg < |||. If F is o-finite and F D E, then gr = gg a.e. 
on F, so ||grF lla > |lgz||q- Let M be the supremum of ||ge||q as E ranges over all 
g-finite sets, noting that M < ||¢||. Choose a sequence {En} so that ||gz,,||g > M, 
and set F = [JY En. Then F is o-finite and ||gr||, > ||gz,||q for all n, whence 


gr \lg = M. Now, if A is a o-finite set containing F, we have 


J Hs J EE J gal? < M7 = / orl’, 


and thus g4\r = 0 and g4 = gr a.e. (Here we use the fact that q < oo.) But if 
f € LP, then A = FU{z: f(x) # 0} is o-finite, so d(f) = f fga = f for. Thus 
we may take g = gr, and the proof is complete. E 
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6.16 Corollary. [f1 < p < œ, LP is reflexive. 


We conclude with some remarks on the exceptional cases p = 1 and p = oo. For 
any measure p, the correspondence g +> ¢, maps L% into (L')*, but in general it is 
neither injective nor surjective. Injectivity fails when pz is not semifinite. Indeed, if 
E C X is aset of infinite measure that contains no subsets of positive finite measure, 
and f € Lt, then {x : f(x) 4 0} is o-finite and hence intersects E in a null set. 
It follows that ¢,,, = 0 although xg # 0 in L. This problem, however, can be 
remedied by redefining L°; see Exercises 23-24. The failure of surjectivity is more 
subtle and is best illustrated by an example; see also Exercise 25. 

Let X be an uncountable set, 4, = counting measure on (X, P(X)), M = the ø- 
algebra of countable or co-countable sets, and po = the restriction of u to M. Every 
f € L'() vanishes outside a countable set, and it follows that L! (p) = Lt (uo). 
On the other hand, L% (u) consists of all bounded functions on X, whereas L®( uo) 
consists of those bounded functions that are constant except on a countable set. With 
this in mind, it is easy to see that the dual of L! (po) is L° (p) and not the smaller 
space L™ (uo). 

As for the case p = oo: the map g — ¢z is always an isometric injection of L? 
into (L°°)* by Proposition 6.13, but it is almost never a surjection. We shall say more 
about this in 86.6; for the present, we give a specific example. (Another example can 
be found in Exercise 19.) 

Let X = [0, 1], p = Lebesgue measure. The map f +> f(0) is a bounded linear 
functional on C'( X), which we regard as a subspace of L°. By the Hahn-Banach 
theorem there exists ¢ E€ (L°)* such that ¢( f) = f(0) for all f € C(X). To see 
that ¢ cannot be given by integration against an L+ function, consider the functions 
fn E C(X) defined by f,(z) = max(1—nz, 0). Then ¢(f,) = fr (0) = 1 forall n, 
but f,(z) — 0 for all z > 0, so by the dominated convergence theorem, f fag — 0 
for all g € L?. 


Exercises 


17. With notation as in Theorem 6.14, if p is semifinite, q < oo, and M,(g) < œ, 
then {zx : |g(x)| > e} has finite measure for all € > 0 and hence S, is o-finite. 


18. The self-duality of L? follows from Hilbert space theory (Theorem 5.25), and this 
fact can be used to prove the Lebesgue-Radon-Nikodym theorem by the following 
argument due to von Neumann. Suppose that u,v are positive finite measures on 
(X, I) (the o-finite case follows easily as in §3.2), and let \ = p+ v. 
a. The map f t> f f dv is a bounded linear functional on L?(A), so f fdv = 
J fgdà for some g € L?(A). Equivalently, f f(1 — g)dv = f fgdu for 
f € L(A). 
b. O < g < 1 A-a.e., so we may assume 0 < g < 1 everywhere. 
c. Let A = {x : g(x) < 1}, B = {zx : g(x) = 1}, and set va (E) = v(AN E), 
y,(E) = v(BNE). Then v, L pand va & p;in fact, dva = g(1 — g)! Xa du. 
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19. Define dn E (I%)* by On(f) = n7t YO] f(j). Then the sequence {¢,,} has 
a weak* cluster point ¢, and ¢ is an element of (l°°)* that does not arise from an 
element of l!. 


20. Suppose sup, || fn||p < co and fn > f ae. 

a. If 1 < p < œ, then fh — f weakly in LP. (Given g € L4, where q is 
conjugate to p, and € > 0, there exist (i) 6 > 0 such that fp |g|? < € whenever 
u(E) < 6, (ii) A C X such that (A) < œ and Sxya |g|1 < €, and (iii) BC A 
such that (A \ B) < and fa — f uniformly on B.) 

b. The result of (a) is false in general for p = 1. (Find counterexamples in 
LI (R, m) and l1.) It is, however, true for p = œ if p is ø-finite and weak 
convergence is replaced by weak* convergence. 


21. If1 < p < œ, fn — f weakly in IP(A) iff sup, ||fnllp < œ and fh > f 
pointwise. 


22. Let X = [0, 1], with Lebesgue measure. 
a. Let fa (x) = cos 2mnz. Then fn — 0 weakly in L? (see Exercise 63 in §5.5), 
but fan Æ 0 a.e. or in measure. 
b. Let fa(x) = NX(0,1/n): Then fn — 0 a.e. and in measure, but fn p 0 
weakly in L? for any p. 


23. Let (X,M, u) be a measure space. A set EF € M is called locally null if 
u( EOF) = 0 for every F € M such that (F) < oo. If f : X — C is a measurable 
function, define 


fll. = inf {a : {x : |f(x)| > a} is locally null}, 


and let L® = £L°(X,M, p) be the space of all measurable f such that || f||x < 00. 
We consider f, g € C to be identical if {x : f(x) # g(x)} is locally null. 
a. If E is locally null, then (E) is either 0 or oo. If u is semifinite, then every 
locally null set is null. 
b. ||- |]. 1s anorm on £° that makes L% into a Banach space. If yz is semifinite, 
then £°° = L&™. 


24. If g © L” (see Exercise 23), then ||g||. = sup{| f fg] : ||fll1 = 1}, so the 
map g +> ¢, is an isometry from L into (L*)*. Conversely, if M.o(g) < œ as in 
Theorem 6.14, then g € £° and M,,(g) = llgl|x- 


25. Suppose u is decomposable (see Exercise 15 in §3.2). Then every ¢ € (L?)* is 
of the form ¢(f) = f fg for some g € L”, and hence (L1)* = L% (see Exercises 
23 and 24). (If F is a decomposition of u and f € L}, there exists {E;} C F such 
that f = )°y° fx, where the series converges in L+.) 
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6.3 SOME USEFUL INEQUALITIES 


Estimates and inequalities lie at the heart of the applications of L? spaces in analysis. 
The most basic of these are the Hélder and Minkowski inequalities. In this section 
we present a few additional important results in this area. The first one is almost a 
triviality, but it is sufficiently useful to warrant special mention. 


6.17 Chebyshev’s Inequality. If f € LP (0 < p < oo), then for any a > 0, 


u({z : |f(2)| > a}) < le)" 


Proof. Let Ea = {x:|f(zx)| > a}. Then 
Ile = / fP > f Po [ 15 aPH(Ba). 
E 


The next result is a rather general theorem about boundedness of integral operators 
on L? spaces. 


6.18 Theorem. Let (X,M, u) and (Y, N,v) be o-finite measure spaces, and let K 
be an (M ® N)-measurable function on X x Y. Suppose that there exists C > 0 
such that f| |K(x,y)|du(x) < C for a.e. y € Y and f |K(z,y)|dv(y) < C for ae. 
x € X, and that 1 < p < œ. If f € LP (v), the integral 


z i K (x,y) f(y) dv(y) 


converges absolutely for a.e. x € X, the function T f thus defined is in L? (p), and 
IT fllo < CIS |p- 


Proof. Suppose that 1 < p < oo. Let q be the conjugate exponent to p. By 
applying Hölder’s inequality to the product 


K(x, y)f(y)| = |K (z, y) (IK (2, y) P/F )I) 


we have 


[Kento < | fikei o)” [SKEDE avto) 7 
<oa J |K (x, y)| |f)? dv(y) K 
for ae. z € X. Hence, by Tonelli’s theorem, 
[| [iscenstniancay]” aua) < œ ff ieo dvi) aut) 
< colon | F(u) duly). 
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Since the last integral is finite, Fubini’s theorem implies that K (z, -)f € L! (v) for 
a.e. x, so that T f is well defined a.e., and 


/ IT f(2)P du(x) < C/D] FIP 


Taking pth roots, we are done. 

For p = 1 the proof is similar but easier and requires only the hypothesis 
J |K(x,y)|du(z) < C; for p = oo the proof is trivial and requires only the hy- 
pothesis f |K(z, y)| dv(y) < C. Details are left to the reader (Exercise 26). a 


Minkowski’s inequality states that the LP norm of a sum is at most the sum of 
the L? norms. There is a generalization of this result in which sums are replaced by 
integrals: 


6.19 Minkowski’s Inequality for Integrals. Suppose that (X,M, p) and (Y, N, v) 
are o-finite measure spaces, and let f be an (M@N)-measurable function on X xY. 


a. If f >Oand1< p< œ, then 
1/p 1/p 
<f | [iew du) av(y). 


|S (Sena) duce) 


b. If1 < p< œ, f(y) € L’ (p) for a.e. y, and the function y > ||f(-,y)|lp és 
in L(y ), then f(x,-) € L! (v) for ae. x, the function x > f f(x,y) dv(y) is 


in L?” (u), and 
| [roat [ele 


Proof. If p = 1, (a) is merely Tonelli’s theorem. If 1 < p < oo, let q be the 
conjugate exponent to p and suppose g E€ L9(u). Then by Tonelli’s theorem and 
Holder’s inequality, 


[| teww] oe )| du(x ) = |] Kelo \| du(x) dv(y) 
< loll f renra] dv(y). 


Assertion (a) therefore follows from Theorem 6.14. When p < oo, (b) follows from 
(a) (with f replaced by |f|) and Fubini’s theorem; when p = oo, it is a simple 
consequence of the monotonicity of the integral. E 











Our final result is a theorem concerning integral operators on (0, oo) with Lebesgue 
measure. 








SOME USEFUL INEQUALITIES 195 


6.20 Theorem. Let K be a Lebesgue measurable function on (0, 00) x (0, 00) such 
that K (Ax, Ay) = 71K (x,y) forall A > O and fy |K(a,1)|271/? dz = C < œ 
for some p € [1,00], and let q be the conjugate exponent to p. For f € LP and 
g € L9, let 


rity) = | " K(a,y)f(e)de, — S9(z) = i ” K(a, y)g(u) dy. 


Then T f and Sg are defined a.e., and |T f |p < Cll flp and ||Sglq < Cllall¢- 


Proof. Setting z = z/y, we have 


OO 


f Kentel [7 ikueviealyar= f Kehu 


where f,(y) = f (yz), moreover, 


0° 1/p oo 1/p 
\Fell= |f KOKI -|f F(E)? da] = 2- Pll fh, 


Therefore, by Minkowski’s inequality for integrals, T f exists a.e. and 


OO 


IT fll < j K(z,1)I Ilfellp dz = Ilo J |K(2, 1)]27"? dz = Clif lp. 


1 


Finally, setting u = y~~, we have 


/ IK(1,y)ly7/ dy = / IK (y73, by dy 
0 0 


=| IK (u, 1)Ju-2/” du = C, 
0 
so the same reasoning shows that Sq is defined a.e. and that ||Sg||, < Cllgllq- E 


6.21 Corollary. Let 


T f(y) =f f(x) da, Sole) = | yoa) dy 
Then forl < p < œ and1 < q < œ, 
ITfle < lfl [Salle < allglle. 
Proof. Let K(x,y) = y7xe(z,y) where E = {(x,y): £x < y}. Then 


fo. (K(x, D|27'/? dz = i z—1/P dz = p/(p — 1) = q, where q is the conjugate 
exponent to p, so Theorem 6.20 yields the result. E 
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Corollary 6.21 is a special case of Hardy’s inequalities; the general result is in 
Exercise 29. 


Exercises 
26. Complete the proof of Theorem 6.18 for the cases p = 1 and p = oo. 


27. (Hilbert’s Inequality) The operator T f(x = Je xz + y)! f(y) dy satisfies 
ZF lo < Collfllp for 1 < p < œ, where Cp = I a—1/P(x + 1)—! dx. (For those 
who know about contour integrals: Show that Cp = mesc(1/p).) 


28. Let Ia be the ath fractional integral operator as in Exercise 61 of §2.6, and let 
Bp aea a ce 


a. Ja is bounded on L?(0, co) for 1 < p < oo; more precisely, 


aa = 


iy ll Fille: 


b. There exists f € L'(0,00) such that Jı f ¢ L1(0, 00). 


29. Suppose that 1 < p < oo, r > 0, and h is a nonnegative measurable function on 
(0, co). Then: 


[re [mora] aes GY [emer 
[ro oa] es GY [arenas 


(Apply Theorem 6.20 with K(x, y) = 2°~!y~?x(0,00)(y—2), f(x) = x7h(x), and 
g(x) = x°h(zx) for suitable 8, y, 6.) 

30. Suppose that K is a nonnegative measurable function on (0,00) such that 
fo K(2)z8~! dx = ¢(s) < œ for0 < s <1. 


a. If1 < p< œ,p7!+q7! =1,and f, g are nonnegative measurable functions 
on (0, 00), then (with f = f) 


J kenen one i oP? fn)? ie} | J o(e) da! 


b. The operator T f(x) = f K y) dy is bounded on L?((0,00)) with 
norm < $(3). ane A case: A K(x) = e~*, then T is the Laplace 
transform and ¢(s) = T (s).) 


31. (A Generalized Holder Inequality) Suppose that 1 < p; < co and 0; P} = 
rl < 1. If f; € L” forj = 1,...,n, then [[} f; € L and | TT; fille = 
T I f;llp;- (First do the case n = 2. j 
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32. Suppose that (X, M, u) and (Y,N,v) are o-finite measure spaces and K € 
L?(p x v). If f € L?(v), the integral Tf(x) = f K(z,y)f(y) dv(y) converges 
absolutely for a.e. x € X; moreover, T f € L?(p) and ||T flo < || Kjell f|lo. 


33. Given 1 < p < œ, let T f(x) = x71? fF f(t) dt. If p7! + q7! = 1, then T is 
a bounded linear map from L7((0,00)) to Co((0, co)). 


34. If f is absolutely continuous on [e, 1] for O < e < 1 and i x| f'(x)|P dx < co, 
then lim,_.9 f(x) exists (and is finite) if p > 2, | f(x)|/|log z|!/* — 0 as x > 0 if 
p = 2, and | f(z)|/2!-@/?) — Oas x > Oif p < 2. 


6.4 DISTRIBUTION FUNCTIONS AND WEAK L? 


If f is a measurable function on (X,M, p), we define its distribution function 
Az : (0,00) — [0, co] by 


Ap(a) = n({2 : |f(2)| > a}). 


(This is closely related, but not identical, to the “distribution functions” discussed in 
§1.5 and 810.1.) We compile the basic properties of Af in a proposition: 


6.22 Proposition. 
a. Xf is decreasing and right continuous. 
b. If |f| < lgl, then Af < Ag. 
c. If|fn| increases to |f|, then Ap, increases to Àg. 
d. If f = g +h, then f(a) < Agla) + àn(ża). 


Proof. Let E(a, f) = {x : |f(x)| > a}. The function A; is decreasing since 
E(a, f) D E(B, f) ifa < £, andit is right continuous since E(a, f) is the increasing 
union of {E(a + n71, f)}$. If |f| < |g], then E(a, f) C E(a,g), so Af < Ag. 
If | fn| increases to |f|, then E(a, f) is the increasing union of {E(a, fn)}, so Àf, 
increases to À. Finally, if f = g + A, then E(a, f) C E(ża, g) U E( ża, h), which 
implies that As (a) < Ag($a) + An (ża). g 


Suppose that A;(a@) < oo for all a > 0. In view of Proposition 6.22a, A; 
defines a negative Borel measure v on (0, co) such that v((a, b]) = As(b) — As (a) 
whenever O < a < b. (Our construction of Borel measures on R in §1.5 works 
equally well on (0, 00).) We can therefore consider the Lebesgue-Stieltjes integrals 
f odr; = f ¢dv of functions ¢ on (0,00). The following result shows that the 
integrals of functions of | f | on X can be reduced to such Lebesgue-Stieltjes integrals. 


6.23 Proposition. If \;(a) < co for all a > 0 and ¢ is a nonnegative Borel 
measurable function on (0, 00), then 


f eeitau=- | * $(a)dd,(a). 
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Proof. If v is the negative measure determined by Àf, we have 


v((a,b]) = As (b) — Ags (a) = -u ({x : a < |f(£)| < b}) = -u (IfI? ((a, 8). 


It follows that v(E) = —u(| f|} (E)) for all Borel sets Æ C (0,00), by the 
ae of extensions (Theorem 1.14). But this means that f x °|fldu = 
— i; @ a) d\;(a@) when ¢ is the characteristic function of a Borel set, and hence 
when ¢ 1 : a The general case then follows by virtue of Theorem 2.10 and the 
monotone convergence theorem. i 


The case of this result in which we are most interested is (a) = a?, which gives 


firqa- f eao 


A more useful form of this equation is obtained by integrating the right side by parts 
(Theorem 3.36) to obtain f |f|? du = p fy aP™1As(a)da. The validity of this 
calculation is not clear unless we know that a? A sla) — 0 as a — 0 and Qa —> œ; 
nonetheless, the conclusion is correct. 


6.24 Proposition. [f0 < p < œ, then 


fiPau=p] oP rp(a)da. 


Proof. If As(a@) = co for some a > 0, then both integrals are infinite. If not, 
and f is simple, then Ap is bounded as a — 0 and vanishes for a sufficiently large, so 
the integration by parts described above works. (It is also easy to verify the formula 
directly in this case.) For the general case, let {gn } be a sequence of simple functions 
that increases to |f|, then the desired result is true for gn, and it follows for f by 
Proposition 6.22c and the monotone convergence theorem. E 


A variant of the L? spaces that turns up rather often is the following. If f is a 
measurable function on X and 0 < p < ow, we define 


[f] = (sup a” Asla), a 


and we define weak L? to be the set of all f such that [f], < oo. [-]p is not a norm; 
it is easily checked that [cf], = |c|[f]p, but the triangle inequality fails. However, 
weak L? is a topological vector space; see Exercise 35. 

The relationship between L? and weak L? is as follows. On the one hand, 


L? C weak L”, and [f] < Ilfllp- 


(This is just a restatement of Chebyshev’s inequality.) On the other hand, if we 
replace (a) by ({f]p/c)? in the integral p ff a?~1) ¢(a) da, which equals | f||2, 
we obtain a constant times [,~ a—! da, which is divergent at both 0 and oo — but 
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just barely. One needs only slightly stronger estimates on A near 0 and oo to obtain 
f € LP. (See also Exercise 36.) The standard example of a function that is in weak 
LP but not in L? is f(x) = x—1/? on (0, 00) (with Lebesgue measure). 

Frequently it is convenient to express a function as the sum of a “small” part and 
a “big” part. The following is a way of doing this that gives a simple formula for the 
distribution functions. 


6.25 Proposition. If f is a measurable function and A > 0, let E(A) = {z : 
|f(z)| > A}, and set 


ha = fXx\E(A) + A(sgn f)XE(A)» ga = f — ha = (sgn f)(|f| — A)xz (a): 


Then 


Noa(@) =Ar(a +A), Anala) = T yas 


The proof is left to the reader (Exercise 37). 


Exercises 


35. For any measurable f and g we have [cf], = |c|[f]p and [f + g]p < 2([f]5 + 
[9}®)1/?;-hence weak L? is a vector space. Moreover, the “balls” {g : [g — f]p < r} 
(r > 0, f € weak LP) generate a topology on weak L? that makes weak L? into a 
topological vector space. 


36. If f € weak LP and u({x : f(x) # 0}) < œ, then f € L? for all q < p. On 
the other hand, if f € (weak LP) N L”, then f € L9 for all q > p. 


37. Prove Proposition 6.25. 
38. fe LP iff Soe QkP A (2*) < 00. 


39. If f € DP, then limg_.9 a? A f(a) = limg-+o0 PA s(a) = 0. (First suppose f is 
simple.) 


40. If fis ameasurable function on X, its decreasing rearrangement is the function 
f* : (0,00) — [0, œo] defined by 


f*(t) = inf{a:As(a) <t} (where inf Ø = oo). 


a. f* is decreasing. If f*(t) < oo then As(f*(t)) < t, and if A¢(a) < œœ then 
Fr(Az(@)) < a. 

b. Af = Af», where A f+ is defined with respect to Lebesgue measure on (0, oo). 
c. If A¢(a) < co for all a > 0 and limg_,.. A#(a@) = 0 (so that f*(t) < oo 
for all t > 0), and ¢ is a nonnegative measurable function on (0, oo), then 
fx 6°|fldu = fg po f*(t) dé. In particular, || flle = || f*||p for 0 < p < œ. 
d. If0 < p < œ, [f]p = supino t/” f*(t). 

e. The name “rearrangement” for f* comes from the case where f is a nonneg- 
ative function on (0,00). To see why it is appropriate, pick a step function on 
(0, oo) assuming four or five different values and draw the graphs of f and f*. 
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6.5 INTERPOLATION OF L? SPACES 


Ifl<p<q<r<o, then (LPN L") c LI c (LP + L”), and it is natural to 
ask whether a linear operator T on LP + L” that is bounded on both LP and L” is 
also bounded on L4. The answer is affirmative, and this result can be generalized in 
various ways. The two fundamental theorems on this question are the Riesz-Thorin 
and Marcinkiewicz interpolation theorems, which we present in this section. We 
begin with the Riesz-Thorin theorem, whose proof is based on the following result 
from complex function theory. 


6.26 The Three Lines Lemma. Let ¢ be a bounded continuous function on the strip 
0 < Rez < 1 that is holomorphic on the interior of the strip. If |¢(z)| < Mo for 
Rez = 0 and |¢(z)| < Mı for Rez = 1, then |¢(z)| < Mj~*M! for Rez = t, 
O<t<l. 


Proof. Fore > Olet ¢-(z) = ¢(z2)M7! M77 exp(ez(z — 1)). Then ġe satisfies 
the hypotheses of the lemma with Mo and M, replaced by 1, and also |¢,(z)| — 0 
as | Im z| — oo. Thus |ġe(z)| < 1 on the boundary of the rectangle 0 < Rez < 1, 
—A < Imz < A provided that A is large, and the maximum modulus principle 
therefore implies that |¢_-(z)| < 1 on the strip 0 < Rez < 1. Letting € — 0, we 
obtain the desired result: 


lolz) Mg My” = lim lde(z)| < 1 for Rez =t. 


6.27 The Riesz-Thorin Interpolation Theorem. Suppose that (X,M, p) and 

(Y, N,v) are measure spaces and po, pı, qo, qı E [1, co]. If qo = qi = œ, Suppose 

also that v is semifinite. ForO < t < 1, define p; and q by 

1 l-t t 1 l-t t 
+ 


pt po pr Qt do “a 


IfT is a linear map from LP° (p) + L” (p) into L® (v) + L” (v) such that ||T f \\qo 


Mollfllpo for f € L” (u) and ||Tfllq, < Millfllp, for f € L” (p), then IT fll 
ME M3 fll. for f € L” (u), 0<t <1, 








IAN IA 


Proof. To begin with, we observe that the case po = pı follows from Proposition 
6.10: If p = po = pı, then 


IPF llae STF lla UT SIG, < Mo Mill flo. 


Thus we may assume that po Æ pı, and in particular that pp < co for0 < t < 1. 
Let ux (resp. Uy ) be the space of all simple functions on X (resp. Y) that vanish 

outside sets of finite measure. Then Nx C L?(y) for all p and Xx is dense in L?(p) 

for p < ov, by Proposition 6.7; similarly for Xy. The main part of the proof consists 
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of showing that ||T fll < Mg °ME||f|lp, for all f € Dx. However, by Theorem 
6.14, 


IT fla = sup {| [Pode] : g € Dy and lglg =1}, 


where q; is the conjugate exponent to q+. (Note that Tf E€ L2NL",so{y: T f(y) # 
0} must be o-finite unless qo = qi = œ; hence the hypotheses of Theorem 6.14 are 
satisfied.) Moreover, we may assume that f # O and rescale f so that || f||p, = 1. 
We therefore wish to establish the following claim: 


e If f € Xx and ||f||p, = 1, then | f(T f)g dv| < Mj~*M? forall g € Sy such 
that ||g||q = 1. 


Let f = oy cjXe, and g = D7} dexm, where the E,’s and the F,’s are disjoint 
in X and Y and the c;’s and d;’s are nonzero. Write cj and d, in polar form: 
cj = |c; |e", dy = |dk|et®r. Also, let 


a(z) = (1— z)p5* +zpī', B(z) =(L-2)ag* +247"; 
thus a(t) = p7 * and G(t) = q7 * for 0O < t < 1. Fix t € (0,1); we have assumed 
that p; < oo and hence a(t) > 0, so we may define 


m 


D aa 
1 


If 6(t) < 1, we define 


gz = Y [dp EPON- pie 
1 


while if G(t) = 1 we define g, = g for all z. (We henceforth assume that 3(t) < 1 
and leave the easy modification for G(t) = 1 to the reader.) Finally, we set 


(2) = J (T fa)ga dv 


Thus, 
¢(z) = Ale eee ag Pee) 
j,k 
where 
Ajg = ACi t) | Txm xn, dv, 


so that ¢ is an entire holomorphic function of z that is bounded in the strip 0 < 
Rez < 1. Since [f(T f)gdv = ¢(t), by the three lines lemma it will suffice to show 
that |d(z)| < Mo for Re z = 0 and |¢(z)| < Mı for Rez = 1. However, since 


a(is) =p o'+is(pp'—po'),  1—Blés)=(1—99') —is(q7* — 40°) 
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for s € R, we have 


'fis| = PPAR] £ | f [Pe/Po | lgis| = g e REESE a |g] 924%. 


Therefore, by Hélder’s inequality, 


|6(és)| < IIT Fisllaollgis lay < Moll fillpollgisllay = Moll/llpellalla; = Mo. 


A similar calculation shows that |@(1 + is)| < M1, so the claim is proved. 

We have now shown that ||T'f lg. < MÈT Millfllp, for f € Ex, so in view of 
Proposition 6.7, T|X x has a unique extension to L”! (u) satisfying the same estimate 
there. It remains to show that this extension is T itself, that is, that T satisfies this 
estimate for all f € L”! (u). Given such an f, choose a sequence {fn} in Ux such 
that |f,| < |f| and fh — f pointwise. Also, let E = {x : |f(z)| > 1}, g = fre, 
On = fnXE, h = f — g, and hn = fn — gn. Then if po < pı (which we may assume, 
by relabeling the p’s), we have g € L” (p), h € LP!(u), and by the dominated 
convergence theorem, ||fn — fllp, > 9 |l9n — gllp, — 0, and ||hn — Alp, — 0. 
Hence ||7'g, — T9||q, — 0 and ||Th,, — Th||q, — 0, so by passing to a suitable 
subsequence we may assume that Tg, — Tg a.e. and Th, —> Th a.e. (Exercise 9). 
But then T fn — T f a.e., so by Fatou’s lemma, 














IT fllo < lim inf ||T falla, < liminf Mo~" Mj fallo = Mo" Milf lp: 


and we are done. ai 


The conclusion of the Riesz-Thorin theorem can be restated in a slightly stronger 
form. Let M (t) be the operator norm of T as a map from L”! (p) to L% (v). We have 
shown that M(t) < Mj~* M+. It is possible for strict inequality to hold; however, 
ifO0<s<t<u<1landt = (1 —7)s+7vw, the theorem may be applied again 
to show that M(t) < M(s)!~7 M(u)’. In short, the conclusion is that log M(t) is a 
convex function of t. 

We now turn to the Marcinkiewicz theorem, for which we need some more 
terminology. Let T be a map from some vector space D of measurable functions on 
(X, M, u) to the space of all measurable functions on (Y, N, v). 


e T is called sublinear if |T(f + g)| < |Z f| + |Tg| and |T(cf)| = c|T f| for 
all f,g € Dandc>O. 


e A sublinear map T is strong type (p,q) (1 < p,q < œ) if L?(w) c D, T 
maps L’ (u) into L7(v), and there exists C > 0 such that ||/T'f |, < C|| f Ilp for 
all f € L?(p). 


e A sublinear map T is weak type (p,q) (1 < p < œ, 1 < q < oo) if 
L? (u) C D, T maps L?() into weak L4(v), and there exists C > 0 such that 
[Tf] < Cll fllp for all f € L(y). Also, we shall say that T' is weak type 
(p, co) iff T is strong type (p, co). 
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6.28 The Marcinkiewicz Interpolation Theorem. Suppose that (X,M, u) and 
(Y, N, v) are measure spaces; po, pı, qo, qı are elements of [1,00] such that po < qo, 
pı < qı, and qo Æ qı; and 


E E A whereQ<t<l. 


p Po pı q qo qı 





IfT is a sublinear map from L®™ (u) + L”! (u) to the space of measurable functions 
on Y that is weak types (po, qo) and (pı, qı), then T is strong type (p,q). More 
precisely, if [T f]; < Cillfllp;, Jor j = 0,1, then ||[T fll < Bpl fllo where Bp 
depends only on pj, qj, Cj in addition to p; and for j = 0,1, By|p — p;| (resp. Bp) 
remains bounded as p > pj if pj < œ (resp. pj = œ). 


Proof. The case po = pj is easy and is left to the reader (Exercise 42). Without 
loss of generality we may therefore assume that po < pı, and for the time being 
we also assume that gg < oo and qı < co (whence also po < pı < co). Given 
f € L’ (u) and A > 0, let g4 and h4 be as in Proposition 6.25. Then by Propositions 
6.24 and 6.25, 


/ Pre tea f BP“), . (B) dB = po | BP) (8 + A) dB 
0 O 
(6.29) e / (B — A)?! (8) dB < po J GP!) p(B) dp, 
A 


A 


[nar du = py [ BP" ng (8) dB = pr [ B”? As (8) dp. 
Likewise, 
(6.30) fies dv = af a1! Arpa) da = 2%q - a17! \r (2a) da. 
Since T is sublinear, by Proposition 6.22d we have 


(6.31) Xr (2a) < AT 94 (a) + ATh,(Q)- 


This is true for all a > 0 and A > 0, so we may take A to depend on a. We now 
make a specific choice of A. Namely, it follows from the equations defining p and q 
that 


(6.32) pol% — 9) _ pi(q-a)_ a -a) _ plu 49). 


o(Po — p) 
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we denote the common value of these quantities by ø, and we take A = a®. Then 
by (6.29), (6.30), (6.31), and the weak type estimates on 7’, 


ITAN < 2% fo? *[(Collgallno/a)” + (Callhalln./a)*] da 


CO oO qo / Po 
< nqog f qi dot p B?°—* d (B) ap da 

(6.33) qı /Pı 
+29gCP pp! [ ager k BISAO) a8 da 


1 qj /Pj 
= 2 qC?” n i d;(a B)ap| da, 


where, denoting by xo and x; the characteristic functions of {(a, 3) : 8 > a7} and 


{(, 8) :B <a}, 
(a, B) = x;(a, Bjal- VPI /45 BITTA p(B). 


Since qo/po > 1 and q1/pı > 1, we may apply Minkowski’s inequality for integrals 
to obtain 
fore) fore) qj /Pj 
/ | [ eile Baa da 


oo oo Pj [qj qj / P} 
, di/Pid d , 
< f / beð a J 


Let r = 1/o. If qı > qo, then q — qo and o are positive and the inequality 8 > a7 
is equivalent to œ < 87, so 


CO ore) po /qo 
[| opea] as 
0 0 
oo p7 Po / qo 
=j / atl dal BPo—! d p(B) dB 
0 0 


= (q — qo) P2/ 4 a (Po—1+Polq—q0)/407 A (B) dG 


(6.34) 


= (q — qo) 7?! l 6-1 s (8) dp 


spg ep A765, 


where we have used (6.32) to simplify the exponent of 8. On the other hand, if 
qı < Qo, then q — qo and o are negative and the inequality 6 > a7 is equivalent to 
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a > B7, so as above, 


CO oO Po / qo (oe) co Po/qo 
i p pola, B) a B / J, Qq Q B ;(8) dp 


= (qo — q)-P2/ j BPD, (8) dp 
= |q — qo|7P/%p-2 | fI. 


A similar calculation shows that 


Co Co pı/qı 
/ p or(a, p)” Pida df = |q =q ap]: 


Combining these results with (6.33) and (6.34), we see that 


1 1/q 
sup{lIZ fla Ifl = 1} < Bp = 20 [D CP ppa- 31 
j=0 
But since |T(cf)| = c|T f| for c > 0, this implies that ||T fll < Bpl|f||p for all 
f € LP (u), and we are done. (The verification of the asserted properties of B, is left 
as an easy exercise.) 

It remains to show how to modify this argument to deal with the exceptional cases 
qo = œ or qı = œ. We distinguish three cases. 

Case I: p) = qi = œ (SO po < Go < oo). Instead of taking A = a? in the 
decomposition of f, we take A = a/Cy. Then ||Thalloo < Cillhalloo < a, so 
ATh,(@) = 0, and we obtain (6.33) with ¢; = 0 and o7 replaced by a/C} in the 
definition of ġo. The same argument as above then gives 


= _, 1/4 
IT Fla < 2{aC8°C3-* (po/p)*/?|q — qol] Llp. 


Case II: pọ < pı < œ, qo < qı = co. Again the idea is to choose A so that 
ATh4(@) = 0, and the proper choice is A = (a@/d)” where d = C1 [p1 || f||2/p] A 
and o = pı/(pı — p) (the limiting value of the ø defined by (6.32) as qi — oo). 
Indeed, since p; > p, we have 


A 
Thali < C?” ihal: = C?” p J aP1-1) s(a) da 
0 


A 
p 
< Cham? | aP Ala) da = OPPE [Z] IF = o”. 
0 


As in Case I, then, we find that ¢; = O in (6.33) and the integral involving ĝo is 
majorized by a constant Bp when || ||, = 1, which yields the desired result. 

Case III: pp < pı < œ, qi < qo = co. The argument is essentially the same as 
in Case II, except that we take A = (a/d)? with d chosen so that àr, (a) =0. E 
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The lengthy formulas in this proof may seem daunting, but the ideas are reasonably 
simple. To elucidate them, we recommend the exercise of writing out the proof for two 
special (but important) cases: (i) po = qo = 1, pı = qı = 2, and (ii) po = Jo = 1, 
Pi = q1 = OO. 

Let us compare our two interpolation theorems. The Marcinkiewicz theorem 
requires some restrictions on pj and q; that are not present in the Riesz-Thorin 
theorem; these restrictions, however, are satisfied in all the interesting applications. 
Apart from this, the hypotheses of the Marcinkiewicz theorem are weaker: T is 
allowed to be sublinear rather than linear, and it needs only to satisfy weak-type 
estimates at the endpoints. The conclusion in both cases is that T’ is bounded from 
LP (u) to L4(v), but the Riesz-Thorin theorem produces a much sharper estimate for 
the operator norm of T. Thus neither theorem includes the other. 

We conclude with two applications of the Marcinkiewicz theorem. The first one 
concerns the Hardy-Littlewood maximal operator H discussed in §3.4, 


1 
H = Se 
ie) Be) i. 


H is obviously sublinear and satisfies ||H flo. < || fllo for all f E€ L®. Moreover, 
Theorem 3.17 says precisely that H is weak type (1, 1). We conclude: 


If(yidy (f € LL. (R”)). 


6.35 Corollary. There is a constant C > 0 such thatif1 < p < ooand f € L?(R”), 
then 


p 
| fllp < CoM fl 
Our second application is a theorem on integral operators related to Theorem 6.18. 
6.36 Theorem. Suppose (X,M, u) and (Y,N,v) are o-finite measure spaces, and 
1 <q < œ. Let K be a measurable function on X x Y such that, for some C > 0, 


we have [K(z,:)]q < C for a.e. £ E X and |K(.,y)|g < C forae ye Y. If 
1 < p < œ and f € LP (v), the integral 


T f(x) = / K(a,y) f(y) dv(y) 


converges absolutely for a.e. x € X, and the operator T thus defined is weak 
type (1,q) and strong type (p,r) for all p,r such that 1 < p < r < œ and 


p`! +q! =r! +1. More precisely, there exist constants By independent of K 
such that 
[Tfh < BiCllfa, = ITF lp < BpCllfllp (p> 1, r7 = p+ -1 > 0). 


Proof. Let p',q' be the conjugate exponents to p, q; then 


las Me ee ae Be eae ae 
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sop <q’ and q < p’. Suppose 0 4 f € LP (1 < p < q’); by multiplying f and K 
by constants, we may assume that || f ||p = C = 1. Given a positive number A whose 
value will be fixed later, define 


E = {(z,y):|K(z,y)| >A}, Ki, =(senK)(|K|-—A)xz, Ko=K—-K,, 
and let T1, T> be the operators corresponding to K1, K2. Then by Propositions 6.24 
and 6.25, since q > 1 we have 


J A E j “ise Ada / o7 da = 


A q=1 





? 


and likewise 
A174 


Gal 





[\ki(ea)ldu(e) < 
Hence, by Theorem 6.18, the integral defining T; f (x) converges for a.e. x and 
A174 
q-1 





Al-4 
(6.37) Ti fllo < gill = 


Similarly, since q < p’, 


A 
[Viele dota) =e f oaeeo 


A / 
co [cr M14q— 2 
0 P—4d 


Therefore, by Hélder’s inequality, the integral defining Tə f(x) converges for every 
x, and 


/ 1/p' 1/p' 
! AP =q r p a 
639 [ITafllo < | =E] a 
p-4q q 
We have thus established that T f = Ti f + Ta f is well defined a.e. 
Next, given a > 0, we wish to estimate A7 ;(a@). But by Proposition 6.22d, 
Arsla) < Ary s(302) + àn s (3a), 
and by (6.38), if we choose 





Erg” 


we will have ||T2f|loo < $a, so that Ar, (5a) = 0. With this choice of A, then, by 


(6.37) and Chebyshev’s inequality we obtain 
2ITi flo]? — [2A ]" 
< da) < 5E] < | — 
arsta) < ansha) s Ele] < | A 
op—(1—q)pr/q ew 


~~ (q—1?P 


r 


œT Pt(1-4)Pr/4 — GA ad 
Q 
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because || f||, = 1 and 


1 — q)pr =r y 
q q p 


A simple homogeneity argument now yields the estimate A\7r;(a) < Cp(||f|lp/a)” 
with no restriction on || f||p, so we have shown that T is weak type (p,r), and in 
particular (for p = 1) weak type (1, q). 

Finally, given p € (1, q’), choose Ð € (p, q’) and define r by T=! = pt —(q')7?. 
Then T is weak types (1, q) and (p, 7), so it follows from the Marcinkiewicz theorem 
that T is strong type (p,r). E 





Exercises 


41. Suppose 1 < p < œ and p™t + q7! = 1. If T is a bounded operator on L? 


such that f(T f)g = f f(Tg) for all f,g € LP N L4, then T extends uniquely to a 
bounded operator on L” for all r in [p,q] (if p < q) or [q, p) Gf q < p). 


42. Prove the Marcinkiewicz theorem in the case po = pı. (Setting p = po = pı, 
we have Ars(a) < (Co||f|lp/a)% and Ars(a@) < (Ci||f||p/a)%. Use whichever 
estimate is better, depending on a, to majorize q ves ata! r f(a) da.) 


43. Let H be the Hardy-Littlkewood maximal operator on R. Compute Hy (0,1) 
explicitly. Show that it is in L? for all p > 1 and in weak Lt but not in Lt, and that 
its LP norm tends to co like (p — 1)~* as p — 1, although ||x(0,1)||p = 1 for all p. 


44. Let Ia be the fractional integration operator of Exercise 61 in §2.6. If0 < a < 1, 
1<p<a',andr—! = p7! — a, then I, is weak type (1, (1 — a)~') and strong 
type (p,r) with respect to Lebesgue measure on (0, co). 


45. If 0 < a < n, define an operator Tẹ on functions on R” by 
Ta f(0) = | le- yl“ Flu) dy. 


Then Ta is weak type (1, (n — a)~') and strong type (p, r) with respect to Lebesgue 
measure on R”, where 1 < p < na! and r~! = p-! —an7!. (The case n = 3, 
a = 1 is of particular interest in physics: If f represents the density of a mass or 
charge distribution, —(47)—!T; f represents the induced gravitational or electrostatic 
potential.) 


6.6 NOTES AND REFERENCES 


The importance of the space L?((a, b]) was recognized soon after the invention of the 
Lebesgue integral because of its connection with Fourier series and other orthogonal 
expansions; and one of the early triumphs of the Lebesgue theory was the discovery 
in 1907 by Fischer [44] and F. Riesz [114] that L?((a, b]) is isomorphic to l°, or 
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what amounts to the same thing, that L?({a, b]) is complete. The spaces L?({a, b}) 
for 1 < p < co were first investigated by F. Riesz [117], who proved all of the major 
results in §86.1-—2 for them as well as the weak sequential compactness of the closed 
unit ball in LP. The fact that (L1)* = L® was first proved by Steinhaus [143]. 

In some respects it is unfortunate that L? spaces were not named L!/P spaces, for 
— as one sees in the conjugacy relation p~! + q~! = 1 and in the results of §6.5 — 
relationships among different L? spaces usually involve linear equations in p—!. 

A discussion of some of the deeper aspects of L? spaces and their applications in 
other areas of analysis can be found in Lieb and Loss [93]. 


86.1: Hölder’s inequality, in the case p = 2, is commonly associated with the 
names of Cauchy (who proved it for finite sums) and Buniakovsky and Schwarz (who 
proved it, independently, for integrals). For general p it was discovered independently 
by Hélder and Rogers. Minkowski’s original inequality was for finite sums. (See 
Hardy, Littlewood, and Pólya [66] for references.) A neat proof of Hélder’s inequality 
using complex function theory can be found in Rubel [122]. 

The relations among the spaces LP + L4, defined in Exercise 4, are studied in 
Alvarez [5]. See Romero [120] for a discussion of Exercise 5, including some other 
conditions for the inclusion LP C L4 to hold, and Miamee [100] for a discussion of 
the more general relation LP (p) C L2(v). 


86.2: A quite different approach to the L? duality theory for 1 < p < œ 
can be found in Hewitt and Stromberg [76, 815]. J. Schwartz [130] has found a 
characterization of (L1)* that is valid on arbitrary measure spaces. 

The proof of Theorem 6.15 breaks down for p = oo because the set function 
v(E) = (xe) need not be countably additive. It is, however, a bounded, finitely 
additive complex measure on (X, M) that is absolutely continuous with respect to pin 
the sense that v( E) = 0 whenever u( E) = 0. Conversely, given a bounded, finitely 
additive complex measure v on (X, MM), one can define the integral of a bounded 
measurable function with respect to v. (One defines f f dv in the obvious way when 
f is simple and then shows that | f f dv| < C|| fllu, so that the integral extends to 
all uniform limits of simple functions.) In this way one obtains a representation of 
(L°°)* as a space of finitely additive complex measures. See Hewitt and Stromberg 
[76, §20], and for a more general treatment of finitely additive integrals, Dunford and 
Schwartz [35, Chapter 3]. (The example of a ¢ € (L°°)* \ L? that we presented at the 
end of §6.2 shows how horrible finitely additive measures can be: If v(E) = ¢(vz), 
then v < m, but v behaves like the point mass at zero when integrated against any 
continuous function.) 


86.3: Theorem 6.18 generalizes results of Schur [129] (for the case p = 2) and 
W. H. Young [164] (for the case K(x, y) = k(x — y); see §8.2). Theorem 6.20 is 
also essentially due to Schur [129]. 

The reader whose appetite for inequalities is not satisfied by this section can find 
a feast in Hardy, Littlewood, and Pólya [66]. 


86.4: The weak LP spaces first appeared implicitly in weak-type estimates, 
instances of which go back to the 1920s; see also the notes for 86.5 below. Decreasing 
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rearrangements (Exercise 40) were introduced by Hardy and Littlewood [65], who 
give an entertaining motivation of their principal theorem on rearrangements in terms 
of cricket averages. 


86.5: The Riesz-Thorin theorem was first proved by M. Riesz (F. Riesz’s younger 
brother) [118] under the assumption that p; < q; for 7 = 0, 1; the proof in the general 
case and the idea of using the three lines lemma are due to Thorin [149]. E. M. Stein 
has proved a very powerful generalization of the Riesz-Thorin theorem. It deals 
with a family {T, : 0 < Rez < 1} of operators that (roughly speaking) depend 
holomorphically on z and satisfy some mild growth conditions as | Im z| — oo, and 
it asserts that if T, is bounded from L?s to L43 for Rez = j (7 = 0,1), then TY is 
bounded from L?* to L% for Rez = t (0 < t < 1), where p+, q are defined as in the 
Riesz-Thorin theorem. The precise statement and proof can be found in Bennett and 
Sharpley [15, §4.3], Stein and Weiss [142, §V.4], or Zygmund [167, §XII.1]. Fora 
further extension of these ideas, see Coifman et al. [28]. 

The Marcinkiewicz interpolation theorem was announced by Marcinkiewicz [97] 
for the case pj = q; (j = 0,1); after his untimely death in World War II, the work 
was completed by Zygmund [166]. The theorem can be proved under still weaker 
hypotheses on 7’; an extra twist to the argument we have given yields the same result 
under the sole assumption that |T(f + g)| < C(|T f| + |Tg|) for some constant C. 
See Zygmund [166], [167, §XII.4]. The spaces LP and weak LP form part of a two- 
parameter family { L(p,q) : 1 < p,q < oo} of function spaces, the so-called Lorentz 
spaces, such that LP = L(p,p) and weak LP = L(p,oo), and the Marcinkiewicz 
theorem can be extended to a result about interpolation of operators on the L(p, q) 
spaces. See Bennett and Sharpley [15, §4.4], or Stein and Weiss [142, 85.3] 

There are many other examples of “continuous families” of Banach spaces for 
which interpolation theorems can be proved — for example, the spaces Aa discussed 
in Exercise 11 in §5.1 and the Sobolev spaces discussed in 89.3. There are also two 
general techniques for constructing “intermediate spaces” between pairs of Banach 
spaces, known as the “complex method” and the “real method,” which may be 
regarded as abstract forms of the Riesz-Thorin and Marcinkiewicz theorems. An 
account of these theories and their applications can be found in Bergh and Löfström 
[16]; see also Bennett and Sharpley [15] the real method and its applications. 

Corollary 6.35 is due to Hardy and Littlewood [65]. Theorem 6.36 appears first 
in Folland and Stein [51], but the essential idea of the proof was discovered by Stein 
several years earlier (see, e.g., Stein [140, §5.1]), and the special case discussed in 
Exercise 44 goes back to Hardy and Littlewood [64]. 





Radon Measures 


The subject of this chapter is measure and integration theory on locally compact 
Hausdorff (LCH) spaces. We have seen in §2.6 that Lebesgue measure on R” 
interacts nicely with the topology on R” — measurable sets can be approximated by 
open or compact sets, and integrable functions can be approximated by continuous 
functions — and it is of interest to study measures having similar properties on 
more general spaces. Moreover, it turns out that certain linear functionals on spaces 
of continuous functions are given by integration against such measures. This fact 
constitutes an important link between measure theory and functional analysis, and it 
also provides a powerful tool for constructing measures. 

Throughout this chapter, X will denote an LCH space. We continue to employ 
the terminology developed in Chapter 1 in the context of metric spaces: By will 
denote the Borel o-algebra on X, that is, the o-algebra generated by the open sets; 
measures on P x will be called Borel measures; countable unions (intersections) of 
closed (open) sets will be called Fy (G5) sets, and so forth. 


7.14 POSITIVE LINEAR FUNCTIONALS ON Cc (X) 


We recall that C,(X ) is the space of continuous functions on X with compact support. 
A linear functional J on Ce( X) will be called positive if I( f) > 0 whenever f > 0. 
In this definition there is no mention of continuity, but it is worth noting that positivity 
itself implies a rather strong continuity property. 
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7.1 Proposition. Jf I is a positive linear functional on C,(X), for each compact 
K C X there is a constant Cg such that |I(f)| < Cx||fllu for all f € C.(X) such 
that supp(f) C K. 


Proof. It suffices to consider real-valued f. Given a compact K, choose ¢ € 
C.(X, [0, 1]) such that ¢ = 1 on K (Urysohn’s lemma). Then if supp(f) C K, 
we have |f| < ||fllud, that is, ||fllug — f = O and |/fll.@+ f 2 0. Thus 
IfFllut(¢) — (f) = Oand || fllu I($) + 1(f) = 0, so that Z(f)| < Oflu m 


If 4 is a Borel measure on X such that (K) < oo for every compact K C X, 
then clearly C.(X) C L! (u), so the map f + f f dy is a positive linear functional 
on Ce( X). The principal result of this section is that every positive linear functional 
on Ce( X) arises in this fashion; moreover, one can impose some additional regularity 
conditions on pz, subject to which yp is unique. These conditions are as follows. 

Let p be a Borel measure on X and E a Borel subset of X. The measure p is 
called outer regular on EÈ if 





(E) = inf{ (U) : U D E, U open} 
and inner regular on Æ if 
(E) = sup{ (K) : K c E, K compact}. 


If p is outer and inner regular on all Borel sets, yz is called regular. It turns out that 
regularity is a bit too much to ask for when X is not o-compact, so we adopt the 
following definition. A Radon measure on X is a Borel measure that is finite on all 
compact sets, outer regular on all Borel sets, and inner regular on all open sets. We 
shall show in 87.2 that Radon measures are also inner regular on all of their o-finite 
sets. 

One further bit of notation: If U is open in X and f € C.(X), we shall write 


fU 


to mean that O < f < 1 and supp(f) c U. (This is slightly stronger than the 
condition 0 < f < xy, which implies only that supp(f) C U.) 


7.2 The Riesz Representation Theorem. Jf I is a positive linear functional on 
Ce(X), there is a unique Radon measure p on X such that I(f) = f fdp for 
all f € C.(X). Moreover, p satisfies 


(7.3) (U) = sup{ I( f): f E€ Ce(X), f < U } for all open U C X 
and 
(7.4) (K) =inf{I(f): f €C.(X), f > xx} forall compact K C X. 


Proof. Let us begin by establishing uniqueness. If jz is a Radon measure such 
that I(f) = f f dwforall f € C.(X),andU C X is open, then clearly I( f) < (U) 
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whenever f < U. On the other hand, if K C U is compact, by Urysohn’s lemma there 
isan f € C.(X) such that f < U and f = 1 on K, whence p(K) < f fdp = I(f). 
Since p is inner regular on U, it follows that (7.3) is satisfied. Thus p is determined 
by J on open sets, and hence on all Borel sets because of outer regularity. 

This argument proves the uniqueness of p and also suggests how to go about 
proving existence. Namely, we begin by defining 


u(U) = sup{I(f) : f € C.(X), f <U} 
for U open, and we then define p*( E) for an arbitrary Æ C X by 
p*(E) = inf{u(U): U D E, U open}. 


Clearly (U) < p(V) if U C V, and hence u*(U) = (U) if U is open. 
The outline of the proof is now as follows. We shall establish that 


i. u* is an Outer measure. 
ii. Every open set is j.*-measurable. 


At this point it follows from Carathéodory’s theorem that every Borel set is p*- 
measurable and that p = p*|Bx is a Borel measure. (The notation is consistent 
because u*(U) = y(U) for U open.) The measure p is outer regular and satisfies 
(7.3) by definition. We next show that 


ili. p satisfies (7.4). 


This clearly implies that jz is finite on compact sets, and inner regularity on open sets 
also follows easily. Indeed, if U is open and a < p(U), choose f € Ce( X) such 
that f < U and I(f) > a, and let K = supp( f). If g € Ce(X) and g > xx, then 
g — f > O and hence I(g) > I(f) > a. But then (K) > a by (7.4), so p is inner 
regular on U. Finally, we prove that 


iv. I(f) = f fdp for all f € C.(X). 


With this, the proof of the theorem will be complete. 

Proof of (i): It suffices to show that if {U;} is a sequence of open sets and 
U =U; U;, then (U) < $7 w(U;). Indeed, from this it follows that for any 
ECX, 


eh) = inf{ XD a(U;) : U; open, E C UU} 
1 1 
and the expression on the right defines an outer measure by Proposition 1.10. If U = 
UP Uj, f € C-(X), and f < U, let K = supp(f). Since K is compact, we have 


K c U; U; for some finite n, so by Proposition 4.41 there exist g1,...,9n E€ Cc(X) 
with g; < U; and Ù>} gj = 10n K. But then f = $`} fg; and fg; < Uj, so 


I(f) =X Ifo) < XO ws) < Y wU;). 








214 RADON MEASURES 


Since this is true for any f < U, we conclude that u(U) < $27 w(U;) as desired. 
Proof of (ii): We must show that if U is open and £E is any subset of X such that 
u*(E) < œ, then p*(E) > p*(ENU) + p*(E \ U). First suppose that E is open. 
Then Æ N U is open, so given € > 0 we can find f € C.(X) such that f < E NU 
and I(f) > p(E NU) -— e. Also, E \ (supp(f)) is open, so we can find g € C.(X) 
such that g < E \supp(f) and I(g) > p(£\supp(f)) —e. But then f +g < E, so 


(f) +1(9) > (ENU) + p(E \ supp(f)) — 2 


(E) > I 
> w(ENU) + p(B \U) — 2. 


Letting € — 0, we obtain the desired inequality. For the general case, if *(E) < co 
we can find an open V D E such that (V) < u*(E) + €, and hence 


H*(V QU) + p*(V \U) 
p*(ENU)+p*(E\U). 


Letting € — 0, we are done. 

Proof of (iii): If K is compact, f € Ce(X), and f > xg, let Ue = {z : f(x) > 
1 — e}. Then Ue is open, and if g < Ue, we have (1 — e) f — g > 0 and so 
I(g) < (1 — )tI(f). Thus p(K) < (Ue) < (1 — )T1I(f), and letting € — 0 
we see that (A) < I(f). On the other hand, for any open U D K, by Urysohn’s 
lemma there exists f € Ce(X) such that f > xg and f < U, whence I(f) < (U). 
Since p is outer regular on K, (7.4) follows. 

Proof of (iv): If suffices to show that I(f) = f fdp if f € Ce(X,[0,1]), as 
Ce( X) is the linear span of the latter set. Given N € N, for 1 < j < N let Kj = 
{x : f(x) > j7N~*} and let Ko = supp(f). Also, define f,,..., fN € Cc(X) 
by Te) = Oifz g Kii f;(z) = F(z) = (j = 1)N7! if x € Kj- \ Kj, and 
f;(z) = N~? if x € Kj. In other words, 


= aE ae 
fi = min max{ f NO o}, x}. 
Then N~' xx, < f; < N7*xx;_,, hence 

1 1 

Wha) S J fi du < sp H(Kj-1). 


Also, if U is an open set containing K;-1 we have Nf; < U and so I(f;) < 
N~'(U). Hence, by (7.4) and outer regularity, 


(Ki) SG) < a KG), 


z| = 


N” 
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Moreover, f = Su f;, so that 


It follows that 
(Ko) - w(Kn) — w(supp(S)) 
in) - f fan] s PRD MEN) < Monnet) 
Since u(supp(f)) < co and N is arbitrary, we conclude that I (f) = f f dp. E 


The proof of this theorem yields something stronger than the statement: We obtain 
not just a Borel measure p but an extension 7 of p to the o-algebra of z*-measurable 
sets. However, it follows from outer regularity that for any & C X, 


p*(E) = inf{u(B): Be Bx, BD Ey, 


so u* is the outer measure induced by p in the sense of §1.4. According to Exercise 
22 in §1.4, therefore, 77 is the completion of yz if u is o-finite and is the saturation of 
the completion of py in general. 

On the other hand, some authors prefer to restrict attention to a smaller o-algebra 
than Bx, namely, the o-algebra BS, generated by C.(X) (that is, the smallest ø- 
algebra with respect to which every f € C,(X) is measurable). The elements of BS, 
are called Baire sets. For more about Baire sets, see Exercises 4—6. 


Exercises 


1. Let X be an LCH space, Y a closed subset of X (which is an LCH space in 
the relative topology), and p a Radon measure on Y. Then I(f) = [(f|Y) dp isa 
positive linear functional on Ce( X), and the induced Radon measure v on X is given 
by v(E) = wW(ENY). 


2. Let u be a Radon measure on X. 
a. Let N be the union of all open U C X such that u(U) = 0. Then N is open 
and (N) = 0. The complement of N is called the support of p. 
b. x € supp(y) iff f f du > 0 for every f € C.(X, (0, 1]) such that f(x) > 0. 


3. Let X be the one-point compactification of a set with the discrete topology. If u 
is a Radon measure on X, then supp(j:) (see Exercise 2) is countable. 


4. Let X be an LCH space. 
a. If f € C.(X, [0,00)), then f~*({a, 00)) is a compact G's set for all a > 0. 
b. If K C X is a compact Gs set, there exists f € C.(X, [0,1]) such that 
K = f~*({1}). 
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c. The o-algebra BY of Baire sets is the o-algebra generated by the compact 
Gs sets. 


5. Let X be a second countable LCH space. 
a. Every compact subset of X is a Gs set. 
b. Bx = BS. 


6. Let X be an uncountable set with the discrete topology, or the one-point com- 
pactification of such a set. Then Bx # BY. 


7.2 REGULARITY AND APPROXIMATION THEOREMS 


In this section we explore the properties of Radon measures in more detail. 
7.5 Proposition. Every Radon measure is inner regular on all of its o-finite sets. 


Proof. Suppose that p is Radon and F is o-finite. If w(E) < oo, for any € > 0 
we can choose an open U D E such that w(U) < (E) + € and a compact F C U 
such that (F) > (U) — e. Since (U \ E) < €, we can also choose an open 
V DU \ Esuch that p(V) < €. Let K = F \ V. Then K is compact, K C E, and 


w(K) = w(F) — (F A V) > p(B) — €- p(V) > p(B) — 2. 


Thus ys is inner regular on E. On the other hand, if (E) = oo, E is an increasing 
union of sets E; with (Ej) < co and u(E;) — oo. Thus for any N € N there exists 
j such that u(E;) > N and hence, by the preceding argument, a compact K C Ej 
with (4) > N. Hence p is inner regular on E. E 


7.6 Corollary. Every o-finite Radon measure is regular. If X is o-compact, every 
Radon measure on X is regular. 


For an example of a nonregular Radon measure, see Exercise 12. 


7.7 Proposition. Suppose that u is a o-finite Radon measure on X and E is a Borel 
set in X. 

a. For every € > 0 there exist an open U and a closed F with F C E C U and 

wU\ PF) <e. 

b. There existan F, set A anda G; set B suchthat A C E C Band p(B\A) = 0. 

Proof. Write E = |J] E; where the E;’s are disjoint and have finite measure. 
For each j, choose an open U; D> Ej with w(U;) < p(E;) + 2717} and let 
U = UP Uj. Then U is open, U D E, and (U \ E) < X7 p(U; \ Ej) < €/2. 
Applying the same reasoning to E°, we obtain an open V D E° with y(V\E°®) < €/2. 
Let F = V°. Then F is closed, F C E, and 


u(U \ F) = (U \ E) + (E \ F) = p(U \ E) + nV \ E°) < €. 


This proves (a), and (b) follows easily; details are left to the reader. E 
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7.8 Theorem. Let X be an LCH space in which every open set is o-compact (which 
is the case, for example, if X is second countable). Then every Borel measure on X 
that is finite on compact Sets is regular and hence Radon. 


Proof. If pis a Borel measure that is finite on compact sets, then C,(X) C L! (p), 
so the map I(f) = f fdp is a positive linear functional on C,(X). Let v be 
the associated Radon measure according to Theorem 7.2. If U C X is open, let 
U =U; K; where each K; is compact. Choose fı € C.(X) so that f < U and 
f = 1 on Kj. Proceeding inductively, for n > 1 choose fn E C.(X) sothat fn < U 
and f, = 1 on U} K; and on a supp(f;). Then fn increases pointwise to xu 
as n — 00, SO 


pU) lim f fy dy =tim | fa dv = v(U) 


by the monotone convergence theorem. Next, if & is any Borel set and € > 0, by 
Proposition 7.7 there exist an open V D E and a closed F C E with v(V \ F) < €. 
But V \ F is open, so p(V \ F) = v(V \ F) < €. In particular, (V) < (E) + €, 
so u is outer regular. Also, y(F) > (E) — €, and F is o-compact (since X is), so 
there exist compact K; C F with u(K;) —> (F), whence p is inner regular. Thus 
u is regular (and equal to v, by the uniqueness part of Theorem 7.2.) E 


Examples of non-Radon measures are considered in Exercises 13-15. In particular, 
Exercise 15 exhibits an example of a finite, non-Radon Borel measure on a compact 
Hausdorff space. 

We now turn to some approximation theorems for measurable functions. 


7.9 Proposition. Jf ų is a Radon measure on X, Ce(X) is dense in L?() for 
Lp OO; 


Proof. Since the L? simple functions are dense in L? (Proposition 6.7), it suffices 
to show that for any Borel set EF with (E) < œœ, xg can be approximated in 
the LP norm by elements of C.(X). Given e > 0, by Proposition 7.5 we can 
choose a compact K C E and an open U D E such that (U \ K) < €, and by 
Urysohn’s lemma we can choose f € Ce(X) such that xn < f < xy. Then 
Ixe — fllo < w(U \ K)! < €'/?, so we are done. E 


7.10 Lusin’s Theorem. Suppose that p is a Radon measure on X and f : X > C 
is a measurable function that vanishes outside a set of finite measure. Then for any 
€ > 0 there exists p € C.(X) such that 6 = f except on a set of measure < €. If f 
is bounded, ¢ can be taken to satisfy |\¢||u < ||f |lu- 


Proof. Let E = {x : f(x) 4 0}, and suppose to begin with that f is bounded. 
Then f € L!(), so by Proposition 7.9 there is a sequence {gn} in C.(X) that 
converges to f in L!, and hence by Corollary 2.32 a subsequence (still denoted by 
{gn }) that converges to f a.e. By Egoroff’s theorem there is a set A C E such that 
(E \ A) < €/3 and gn — f uniformly on A, and there exist a compact B C A and 
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an open U D E such that (A \ B) < €/3 and p(U \ E) < €/3. Since g, > f 
uniformly on B, f|B is continuous, so by Theorem 4.34 there exists h € Ce(X) 
such that h = f on B and supp(h) C U. But then {x : f(x) 4 h(x)} is contained 
in U \ B, which has measure < e. 

To complete the proof for f bounded, define 8 : C —> C by (z2) = zif |z| < ||fllu 
and 3(z) = || fllu sgn z if |z| > || fllu, and set 6 = Goh. Then d € Ce(X) since 8 
is continuous and G(0) = < ||f|lu, and o = f on the set where 
h = f, so we are done. 

If f is unbounded, let A, = {x : 0 < |f(x)| < n}. Then A,, increases to E as 
n — oo, so p(E' \ An) < €/2 for sufficiently large n. By the preceding argument 
there exists ¢ E€ C.(X) such that ¢ = fx 4, except ona set of measure < €/2, and 
hence ¢ = f except on a set of measure < €. E 











Our final group of results in this section concerns semicontinuous functions. If X 
is a topological space, a function f : X — (—oo, oo] is called lower semicontinuous 
(LSC) if {x : f(x) > a} is open for all a € R, and f : X — [—o0, 00) is called 
upper semicontinuous (USC) if {x : f(x) < a} is open for all a € R. 


7.11 Proposition. Let X be a topological space. 
a. If U is open in X, then xy is LSC. 
b. If f is LSC and c € (0,00), then cf is LSC. 
c. If S is a family of LSC functions and f(x) = sup{g(x) : g € G}, then f is 
LSC. 
d. If fı and fə are LSC, so is fı + fa. 
e. If X isan LCH space and f is LSC and nonnegative, then 


f(x) = sup{g(z): g E€ C.(X), 0< 9 < f}. 


Proof. (a) and (b) are obvious, and (c) follows from the observation that 


fT! ((azco|) = LJ 97* ((a, œ0]). 


gEŞ 


As for (d), if fı(£o) + fo(xo0) > a, choose € > 0 so that fı (zo) > a — fo(zo) + €. 
Then 


{2 : (f1+f2)(£) >a} D {x: fi(z) > a- fal(zo)+e}N{z : falz) > fa(zo)—e}, 


which is a neighborhood of zo. Thus fı + fo is LSC. Finally, if X is LCH, f(x) > 0 
and 0 < a < f(x), then U = {y : f(y) > a} is an open set containing x, so by 
Urysohn’s lemma there exists g € Ce( X) such that g(x) = aand0 < g < axu < f. 
This establishes (e) when f(x) > 0, and (e) is trivial when f(x) = 0. E 


There is, of course, a corresponding set of results for USC functions, whose 
formulation is left to the reader. The following result is a monotone convergence 
theorem for nets of LSC functions. 
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7.12 Proposition. Let S be a family of nonnegative LSC functions on an LCH space 
X that is directed by < (that is, for every 91,92 © G there exists g E€ G such that 
gı < gand g2 < g). Let f =sup{g: g € 9}. If p is any Radon measure on X, then 
J fdp = sup{ f g dp : g € 9}. 


Proof. By Proposition 7.11c, f is LSC and hence Borel measurable, and clearly 
Jf f du > sup{f gdp}. To prove the reverse inequality, consider the sequence ¢,, of 
simple functions increasing to f that was constructed in Theorem 2.10: 


gen 
1 
bn = Fe XO Xun; Where Unj = {x: f(x) > j2-"}. 


j=l 
By the monotone convergence theorem, givena < f f dp we can fix n large enough 
so that 27” >| (Unj) = fondu > a Since Unj is open, there exist compact 
K; C Ung (1 < j < 27”) such that 27” >, (K3) > a. Let y = 27" Ð; xx,. For 
each x € LU, Kj we have f(x) > ¢n(x) > H(z), so we can pick gs E G such that 
9z(x) > W(x). But —xx, is LSC, so gz — Y% is LSC by Proposition 7.11d, and hence 
the set Vz = {y : Y(y) < gz(y)} is open. Thus {V, : x € U; K; } is an open cover 
of U; K;į, so there is a finite subcover V,,,..., Vz,,. Pick g € 9 such that gz, < g 
for k = 1,...,m; then Y < g, so f gdp > a. Since a was any number less than 
J f du, we are done. E 


7.13 Corollary. If p is Radon and f is nonnegative and LSC, then 


[fan=s} f odn:ge Ce(X), EER 
Proof. Combine Propositions 7.11e and 7.12. E 


7.14 Proposition. Jf ņ is a Radon measure and f is a nonnegative Borel measurable 
function, then 


[sau=int | f odn: 9> tana gistsc). 
If {x : f(x) > 0} is o-finite, then 


[fau=sup{ f odn:0< g< fandgisusch. 


Proof. Let {¢,} be a sequence of nonnegative simple functions that increase 
pointwise to f. Then f = ¢ + $25 (n — n-1), and each term in this series is a 
nonnegative simple function, so we can write f = $7 a;x E; Where a; > 0. Given 
€ > 0, foreach j choose an open U; D Ej such that u(U;) < p(E;)+e/(2%a;). Then 
g = OY a;xu, is LSC by Proposition 7.11, g > f, and f gdp < f f du + €. This 
establishes the first assertion. For the second, if a < f f dy, let N be large enough 
so that sy aju(E;) > a. Since the E,’s are o-finite, by Proposition 7.5 there are 
compact sets K; C Ej such that ey a;p(K;) > a. Thus if g = `} ajxx,, then g 
is USC, g < f, and f gdu >a. E 
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Exercises 


7. If pis a o-finite Radon measure on X and A € Bx, the Borel measure uA 
defined by p4 (E) = (EN A) is a Radon measure. (See also Exercise 13.) 


8. Suppose that p, is a Radon measure on X. If € L!(u) and ¢ > 0, then 
v(E) = J dy is a Radon measure. (Use Corollary 3.6.) 


9. Suppose that p is a Radon measure on X and ¢ € C(X,(0,00)). Let v(E) = 
f p dp, and let vy’ be the Radon measure associated to the functional f > f fodp 
on Ce( X). 

a. If U is open, v(U) = v'(U). (Apply Corollary 7.13 to xy.) 

b. v is outer regular on all Borel sets. (Hint: The open sets Vz = {x : 2° < 

p(x) < 2*+7\} k € Z, cover X.) 

c. v = v’, and hence v is a Radon measure. (See also Exercise 13.) 


10. If is a Radon measure and f € L! (pu) is real-valued, for every € > 0 there exist 
an LSC function g and a USC function h such that h < f < g and f(g — h) dp < €. 


11. Suppose that p is a Radon measure on X such that p({x}) = 0 forall z € X, 
and A € Bx satisfies 0 < (A) < oo. Then for any æ such that 0 < a < p(A) 
there is a Borel set B C A such that p(B) = a. 


12. Let X = R x Ra, where Ra denotes R with the discrete topology. If f is a 
function on X, let f” (x) = f(x,y); and if E C X, let E” = {zx : (x,y) € E}. 
a. f € C.(X) iff f” € C.(R) for all y and f” = 0 for all but finitely many y. 
b. Define a positive linear functional on Ce(X) by I(f) = Dover J f(z, y) dz, 
and let u be the associated Radon measure on X. Then (E) = œœ for any E 
such that £¥ Æ Ø for uncountably many y. 
c. Let E = {0} x Rg. Then (E) = œ but (K) = 0 for all compact K C E. 


13. In the setting of Exercise 12, let A = (R \ {0}) x Rg and $(2, y) = |z|. Then 
the measures ua (E) = (AN E) and v(E) = fp ¢dy are not Radon. (Thus, the 
hypotheses that u be o-finite in Exercise 7, that 6 € L! (u) in Exercise 8, and that 
o > 0 in Exercise 9, cannot be dropped.) 


14. Let u be a Radon measure on X, and let po be the semifinite part of u (see 
Exercise 15 in §1.3). 
a. jo is inner regular on all Borel sets. 
b. uo is outer regular on all Borel sets E such that (E) < oo. 
c. [fdp = f f duo forall f € C.(X). 
d. If y is the measure of Exercise 12 and m is Lebesgue measure on R, then 
po(E) = $ er M(E”) for any Borel set E. 


15. Let Q be the set of countable ordinals, w the first uncountable ordinal, and 

O* = NU {w1}. Let Q* be endowed with the order topology (see Exercise 9 in §4.1). 
a. Q* is a compact Hausdorff space. (Hint: Q* contains no infinite strictly 
decreasing sequences.) 
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b. Q is an open set in Q* that is not o-compact. 

c. A subset E of Q is uncountable iff for each x € Q there exists y € E such 
that x < y. 

d. If {En} is a sequence of uncountable closed sets in Q*, then QY En is 
uncountable. (If {x;} is an increasing sequence in 22 such that each E,, contains 
infinitely many z,;’s, then lim;_,.. 2; exists and is in ‘ate En.) 

e. If E C Bo», then either E U {w1} or E° U {w1} contains an uncountable 
closed set. (Hint: The set of all & satisfying the latter condition is a o-algebra.) 
f. Define p on Bo» by (E) = 1 if E U {w} contains an uncountable closed 
set, u( E) = 0 otherwise. Then p is a measure, u({w1 }) = 0, but w(U) = 1 for 
every open U containing w1. 

g. If f e C(*), there exists x € Q such that f(y) = f(w,) for y > x. (If 
En = {z:|f(x) — f(w1)| < n71}, then ES is countable.) 

h. With yz as in (f), the Radon meausre on Q* associated to the functional 
f > J f dp is the point mass at wy. 


7.3 THE DUAL OF Co(X ) 


We recall that for any LCH space X, Co(X) is the uniform closure of C.(X) 
(Proposition 4.35), and hence if yz is a Radon measure on X, the functional I(f) = 
| f dp extends continuously to Co(X) iff it is bounded with respect to the uniform 
norm. In view of the equality 


u(x) =sup{ f fdu: fe ctx), 0<f <1} 


(a special case of (7.3)) together with the fact that | f f du| < f |f| dy, this happens 
precisely when p(X) < oo, in which case p(X) is the operator norm of T. 

We have therefore identified the positive bounded linear functionals on Co(X): 
they are given by integration against finite Radon measures. Our object in this section 
is to extend this result to give a complete description of Co(X)*. The key fact is that 
real linear functionals on Co(X, R) have a “Jordan decomposition.” 


7.15 Lemma. /f I € Co(X,R)*, there exist positive functionals I= € Co(X,R)* 
such that I = It —I-. 


Proof. If f € Co(X, [0, 00)), we define 
I*(f) =sup{I(g) : g € Co(X,R), 0<g <f} 


Since |I(g)| < IZI lga < IZI fllu for 0 < g < f, and I(0) = 0, we have 
0 < I*(f) < |[Zl |lfllu. We claim that I* is the restriction to Co(X, [0, 00)) of 
a linear functional; the proof is much the same as the proof of the linearity of the 


integral in §2.3. 
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Obviously J*(cf) = cI*(f) if c > 0. Also, whenever O < gı < fı and 
0 < go < fo wehaveO < gitge < fit f2, sothat I+ (fı +f2) > I(g1)+1(g2), and 
it follows that [+ ( f1 + fo) > I+ (f1)+I* (fe). On the other hand, if0 < g < fi+ fo, 
let gı = min(g, f1) and g = g — gı. Then O < gı < fı and 0 < go < f2, so 
I(g) = I(g1)+1 (92) < I* (fi) +I* (fo); therefore I+ (fı + fo) < I+ (f1) +17 (fo). 
In short, I+ (fi + fo) = I+ (f1) + I+ (fo). 

Now, if f € Co(X,R), then its positive and negative parts ft and fT are in 
Co(X, [0, œ0)), and we define J+ (f) = It (ft) -—I*(f7). Ifalso f = g — h where 
g,h > 0, then g + fT = h+ ft, whence I+ (g) + I*(f7-) = I*(h) + I*(f7). 
Thus I+ (f) = I*(g) — I*(h), and it follows easily as in the proof of Proposition 
2.21 that J* is linear on Co(X, R). Moreover, 


IEA < max(I7(f*), IET) < M max (If o IET Ie) = WZ e, 


so that ||Z*]] < ||Z]. 
Finally, let Z7 = I* — I. Then I~ € Co(X,R)*, and it is immediate from the 
definition of J* that J* and I~ are positive. E 


Any I € Co(X)* is uniquely determined by its restriction J to Co(X, R), and 
we have J = J; + iJə where Ji, J are real linear functionals. We therefore 
conclude from Lemma 7.15 and the discussion preceding it that for any I € Co(X)* 
there are finite Radon measures p1, ..., p4 such that I(f) = f fdp where p = 
pı — Ho + if u3 — pa). 

At this point we need some more definitions. A signed Radon measure is 
a signed Borel measure whose positive and negative variations are Radon, and a 
complex Radon measure is a complex Borel measure whose real and imaginary 
parts are signed Radon measures. (It is worth noting that on a second countable LCH 
space, every complex Borel measure is Radon. This follows from Theorem 7.8 since 
complex measures are bounded.) We denote that space of complex Radon measures 
on X by M(X), and for p € M(X) we define 


lell = lel(X), 


u| is the total variation of p. 





where, of course, 


7.16 Proposition. Zf p is a complex Borel measure, then u is Radon iff |u| is Radon. 
Moreover, M(X) is a vector space and u +> ||1|| is a norm on it. 


Proof. We observe that a finite positive Borel measure v is Radon iff for every 
Borel set & and every € > O there exist a compact K and an open U such that 
K c E C U and v(U \ K) < €, by Propositions 7.5 and 7.7. The first assertion 
follows easily from this. Indeed, if p = uy — po + i(u3 — u4) and |p|(U \ K) < €, 
then u;(U \ K) < e€ for all j; conversely, if ;(U; \ Kj) < €/4 for all j, then 
u(U \ K) < e where K = UÍ K; and U = f$ U}. The same argument shows that 
M(X) is closed under addition and scalar multiplication. Finally, that || - || is a norm 
on M(X) follows from Proposition 3.14. a 
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7.17 The Riesz Representation Theorem. Let X be an LCH space, and for m € 
M(X) and f € Co(X) let I (f) = f fau. Then the map p |> I,, is an isometric 
isomorphism from M(X) to Co(X)*. 


Proof. We have already shown that every I € Co(X )* is of the form J,,. On the 
other hand, if u € M(X),by Proposition 3.13c we have 


J tan 


so I, € Co(X)* and || || < ||u||. Moreover, if h = du/d|u|, then |h| = 1 by 
Proposition 3.13b, so by Lusin’s theorem, for any € > 0 there exists f € Ce (X) such 
that || f ||. = 1 and f = h except ona set E with |u|(E) < €/2. Then 


Wall = f hPa = [Rau <| f fault] fi-a 


< |J fau|+2luE) < | f fanl +e < pl +e 





< J fidil < |fllullell, 











It follows that |||] < ||Z„ ||, so the proof is complete. E 


7.18 Corollary. If X is a compact Hausdorff space, then C(X)* is isometrically 
isomorphic to M(X). 


Let u be a fixed positive Radon measure on X. If f € L'(,1), the complex measure 
dvs = f dp is easily seen to be Radon (Exercise 8), and ||vs|| = f |f| du = || fll. 
Thus f +> vs is an isometric embedding of L! (u) into M(X) whose range consists 
precisely of those v E€ M(X) such that v < u. (The last statement follows from 
the Radon-Nikodym theorem, which applies even if jz is not o-finite; see §7.5.) The 
most important example of this situation is y = m = Lebesgue measure on R”, and 
we shall identify L! (m) with a subspace of M (R”). 

The weak* topology on M(X) = Co(X)*, in which pa —> pw iff f fda > 
f f du forall f € Co(X), is of considerable importance in applications; we shall call 
it the vague topology on M(X). (The term “vague” is common in probability theory 
and has the advantage of forming an adverb more gracefully than “weak*.”) The 
vague topology is sometimes called the weak topology, but this terminology conflicts 
with ours, since Co(X ) is rarely reflexive (see Exercise 20). Weak convergence 
arguments for L? (u) generally fail for p = 1 because L!(p) is not the dual of 
L°(,), but good substitute results can often be obtained by regarding L1(j:) as a 
subspace of M(X) as in the preceding paragraph and using the vague topology there. 

We conclude by presenting a useful criterion for vague convergence in M (R). 


7.19 Proposition. Suppose u, p, p2,... E€ M(R), and let F(x) = pn((—00, z]) 
and F(x) = p((—o0, z]). 
a. Ifsup,, ||unl| < co and F(x) — F(x) for every x at which F is continuous, 
then Hn — p vaguely. 
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b. If Un — u vaguely, then sup, ||Un|| < œ. If in addition, the un ’s are positive, 
then F (x) — F(x) at every x at which F is continuous. 


Proof. (a) Since F is continuous except at countably many points (Theorems 3.27 
and 3.29), Fa —> F a.e. with respect to Lebesgue measure. Also, ||Fn|lu < ||n||, so 
the F;,’s are uniformly bounded. If f is continuously differentiable and has compact 
support, then, integration by parts (Theorem 3.36) and the dominated convergence 
theorem yield 








[fain = | POF | f@F(e)ax = f fap 


But by Theorem 4.52, the set of all such f’s is dense in Co(R), so f fn dy > f f dp 
for all f € Co(R) by Proposition 5.17. Thus un — p vaguely. 

(b) If un —> p vaguely, then sup,, |||] < oo by the uniform boundedness 
principle. Suppose that un > 0, and hence pz > 0, and that F is continuous at x = a. 
If f € C.(R) is the function that is 1 on [—N, a], 0 on (—oo, —N — e€] and [a+e, oo), 
and linear in between, we have 


Fala) = Fa(-N) = pin((=Nal) < | fdin > f fap 
< F(a+e) —F(-N —e). 
As N —> œ, F,(—N) and F(—N — €) tend to zero, so 


lim sup Fn (a) < F(a+e). 
Similarly, by considering the function that is 1 on [-N + €, a — €], 0 on (—oo, N] 
and [a, oo), and linear in between, we see that 

lim inf F,,(a) > F(a- e€). 


n — CO 


Since € is arbitrary and F is continuous at a, we have F(a) —> F(a) as desired. g 


Exercises 


16. Suppose that I € Co(X,R)* and I*, I~ are the functionals constructed in the 
proof of Lemma 7.15. If p is the signed Radon measure associated to J, then the 
positive and negative variations of jz are the Radon measures associated to J+ and 
I-. 

17. If u is a positive Radon measure on X with p(X) = oo, there exists f € Co(X) 


such that f f du = oo. Consequently, every positive linear functional on Co(X) is 
bounded. 


18. If p is a o-finite Radon measure on X and v € M(X), let v = vı + Vy be the 
Lebesgue decomposition of v with respect to u. Then vı and r are Radon. (Use 
Exercise 8.) 
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19. Let X be a completely regular space and A a completely regular subalgebra of 
BC(X) (see Exercise 73 in §4.8). Find a description of A* as a space of measures. 


20. Some examples of nonreflexivity of Co( X): 
a. If p E€ M(X), let ®(u) = docx w({r}). This sum is well defined, and 
$ e M(X)*. If there exists a nonzero u E€ M(X) such that p({x}) = 0 for all 
x € X, then ® is not in the image of Co( X) in M(X)* = Co(X)*™. 
b. At the other extreme, let X = N with the discrete topology; then Co(X)* S l1 
and (/!)* = 1°. (Note: Co(N) is usually denoted by co.) 


21. Let { fa}aea be a subset of Co(X) and {ca }aea a family of complex numbers. 
If for each finite set B C A there exists ug E€ M(X) such that ||ugB|| < 1 and 
f fadup = Ca fora € B, then there exists u € M(X) such that ||u|| < 1 and 
f fa dpe = ca for alla € A. 


22. A sequence {fn } in Co(X ) converges weakly to f € Co(X) iffsup || fnllu < oo 
and f, — f pointwise. 


23. The hypothesis of positivity in Proposition 7.19b is necessary. (Take un to be 
the difference of the point masses at n7! and -n7 1.) 


24. Find examples of sequences {yn} in M (R) such that 
a. Hn — 0 vaguely, but ||unl| 4 0. 
b. un — O vaguely, but f f dun 4 f fdp for some bounded measurable f 
with compact support. 
C. Hn > Oand un —> 0 vaguely, but there exists x € R such that Fn (x) A F(x) 
(notation as in Proposition 7.19). 


25. Let u be a Radon measure on X such that every nonempty open set has positive 
measure (e.g., Lebesgue measure). For each x € X there is a net { fa } in L! (p) that 
converges vaguely in M(X) to the point mass at x. If X is first countable, the net 
can be taken to be a sequence. (Consider functions of the form (U yl yu.) 


26. If {un} C M(X), un > n vaguely, and ||un|| — |||], then f f dyn > f f dp 
for every f € BC(X). (If u = O the result is trivial. Otherwise, there exists 
g € Cel X) with ||gllu < 1 such that f gdp > ||u|| — €, and f gf dun > fS gf dp 
for f e BC(X).) Moreover, the hypothesis || un || — ||| cannot be omitted. 


27. Let C*¥([0,1]) be as in Exercise 9 in §5.1. If Z € C*([0,1])*, there exist 
u E€ M((0,1]) and constants co, ... , Ck—1, all unique, such that 





k-1 
If) = f £ due Desf), 
0 


(The functionals f ++ f%)(0) could be replaced by any set of k functionals that 
separate points in the space of polynomials of degree < k.) 
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7.4 PRODUCTS OF RADON MEASURES 


In this section we study Radon measures on product spaces. X and Y will denote 
LCH spaces, and mx and zy will denote the projections of X x Y onto X and Y, 
respectively. 


7.20 Theorem. 
a Bx By CBxxy. 
b. If X and Y are second countable, then Bx ® By = Bxxy. 


c. If X and Y are second countable and u and v are Radon measures on X and 
Y, then p x v is a Radon measure on X x Y. 


Proof. Parts (a) and (b) are direct generalizations of Proposition 1.5, and the 
proof is essentially the same. The main tool is Proposition 1.4. It implies, first, that 
Bx & By is generated by the sets U x V where U is open in X and V is open 
in Y. Since these sets are open in X x Y, we have By © By C Bxyy. If the 
topologies on X and Y have countable bases € and F, then every open set in X, 
Y, or X x Y is a countable union of sets in €, F, r {U x V:U eE V € F}. 
It follows that Bx, By, and Bxxy are generated by these families and hence that 
Bx 8 By = Bxxy. As for (c), p x v is a Borel measure by (b), so by Theorem 
7.8 we need only show that (u x v)(K) is finite for every compact K C X x Y. 
But this is easy: mx(K) and ry (K) are compact, and K C 7(K) x ma(K), so 
(u x v)(K) = u(rx(K))v(ry(K)) < œ. E 


When X or Y is not second countable it can happen that Bx @ By # Bxxy; see 
Exercises 28 and 29. In this case the product of Radon measures is certainly not a 
Radon measure. However, there is a natural way of manufacturing a Radon measure 
from it. To see this, we need a couple of facts about continuous functions. If g and 
h are functions on X and Y, we define g ® h on X x Y by 


g D h(£,y) = g(x)h(y). 


7.21 Proposition. Let P be the vector space spanned by the functions g & h with 
g E€ Ce( X), h E€ CY). Then P is dense in Ce(X & Y ) in the uniform norm. More 
precisely, given f € C.(X x Y), € > 0, and precompact open sets U C X and 
V C Y containing 7x(supp(f)) and ny (supp(f)), there exists F € P such that 
|F — fllu < € and supp(F) C U x V. 


Proof. U x V is a compact Hausdorff space. It follows easily from the Stone- 
Weierstrass theorem that the linear span of {g & h : g € C(U), h € C(V)} is 
dense in C(U x V). In particular, there is an element G of this linear span such that 
supp, y |G — f| < €. Also, by Urysohn’s lemma there exist  € Ce(U, [0, 1]) and 
w E C.(V, [0,1]) such that ¢ = 1 on rx(supp(f)) and y = 1 on zy(supp(f)). 


Thus if we define F = (6 @ y)G on U x V and F = 0 elsewhere, we have F € ?, 
supp(F) CU x V, and |F — fllu < €. E 
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7.22 Proposition. Every f € C.(X x Y) is Bx & By-measurable. Moreover, if u 
and v are Radon measures on X and Y, then C.(X x Y) C L! (p x v), and 


[tauxry= ff pana = ff pave (f €C.(X x Y)). 


Proof. If g € C.(X) andh E€ C,(Y), we have g @h = (gomx)(ho ry). 
Since 7x and ry are measurable from By © By to By and By (by definition of 
Bx ® By) and g and hare continuous, gory and hory are Bx @ By -measurable. 
Since products, sums, and pointwise limits of measurable functions are measurable, 
the first assertion follows from Proposition 7.21. Also, every f € C.(X x Y) is 
bounded and supported in a set of finite (u x v)-measure, hence is in L! (pu x v). 
Fubini’s theorem holds for such f even if and v are not o-finite because one can 
replace u and v by the finite measures pz|7 (supp(f)) and v|ry (supp(f)). E 


It is now clear how to obtain a Radon measure on X x Y from Radon measures u 
and v on X and Y. Namely, by Proposition 7.22 the formula I(f) = f f d(u x v) 
defines a positive linear functional on C,(X x Y), so it determines a Radon measure 
on X x Y by the Riesz representation theorem. We call this measure the Radon 
product of and v and denote it by u X v. The obvious question is: Does u X v agree 
with u x v on Bx @ By? In general, the answer is no. Indeed, a counterexample 
may be obtained by taking X = R, Y = Ra (R with the discrete topology), y = 
Lebesgue measure, and v = counting measure. It is not hard to see that in this case 
Bxxy = Bx & By, but Exercises 12 and 14 show that u X v is not semifinite and 
that x v is the semifinite part of u Q v. However, some results are still available, 
and in the o-finite case everything works out beautifully. In what follows, we employ 
the notation of x-sections and y-sections introduced in §2.5. 


7.23 Lemma. 
a. If E € Bxxy, then Ez E€ By forall x € X and EY € Bx forally € Y. 


b. If f: X xY — Cis Bxxy-measurable, then f, is By -measurable for all 
x E X and f” is Bx-measurable forall y € Y. 


Proof. The collection of all E C X x Y such that E} € By and EY € Bx for 
all x,y is easily seen to be a o-algebra. It contains all open sets — if E is open, 
so are Ez and E”, being inverse images of E under the maps y’ +> (x, y’) and 
x’ + (x',y)— and hence it contains Bx xy. This proves (a), and (b) follows since 


(fz) (A) = (fT (A))z and (f%)~*(A) = (fA). E 


7.24 Lemma. If f € Ce(X x Y) and p and v are Radon measures on X and Y, 
then the functions x +> f fz dv andy œ> f f” dp are continuous. 


Proof. We write out the proof only for fz. It suffices to show that for any £o € X 
and € > 0 there is a neighborhood U of zo such that || fz — fzollu < € for x € U, 
since then 


[te fro) d| < ev (av (supp). 
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However, for each y € my (supp(f)) there exist neighborhoods U,, V} of zo and y 
such that if (x,z) € Uy x Vy, then |f(zo,y) — f(z, z)| < 4€. We may choose a 
finite subcover V,,,..., Vy, of my (supp(f)) and then take U = f} U,,; details are 
left to the reader. E 


7.25 Proposition. Let u and v be Radon measures on X and Y. IfU is open in 
X x Y, then the functions x +> v(U,) and y +> p(UY) are Borel measurable on X 
and Y, and 


ux v(U) = f Wz) dule) = f aU”) av). 
Proof. LetF = {f € Ce(XxY):0< f< xu}. By Proposition 7.11 we have 


xu = sup{f : f € F} and hence xy, = sup{ fs : f € F} and xyy = sup{ f” : 
f € F}. Thus by Proposition 7.12, 


wR uU) = sup] [ faun): fes), 
WU) =sup{ | fdv: fes}, MU) =supf | fran: fes. 


From Lemma 7.24 and Proposition 7.11 it follows that x +> v(U,) and y œ> p(UY) 
are LSC and hence Borel measurable. Another application of Proposition 7.12, 
together with Proposition 7.22, yields 


u x v(U) =sup{ / fadv du(c): es} 


x J sup{ / s. dv: fe s} d(x) = | Wz) dula), 


and likewise u x v(U) = f p(U”) dv(y). E 


7.26 Theorem. Suppose that p and v are o-finite Radon measures on X and Y. If 
E € Bxxy, then the functions x +> v( Ez) and y +> u( E”) (which make sense by 
Lemma 7.23) are Borel measurable on X and Y, and 


uR VE) = f (Ex) dele) = f aE) dvl): 


Moreover, the restriction of u X v to Bx @ By is u x v. 


Proof. For the moment, let us fix open sets U C X and V C Y with (U) and 
v(V) finite, and let W = U x V. Let M be the collection of all sets E € Bxxy 
such that Æ N W satisfies the conclusions of the theorem. We then have 


i. M contains all open sets, by Proposition 7.25. 
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ii. If E,F € Mand F C E, then E \ F € M; in particular, if F € M, then 
F° = X \ F € M. Indeed, we have 


uxv(EAW)=pxu(FAW)+pxv((E\ F)AW), 


and likewise for v((ENW )z) and p((EAW)*). Since the conclusions are true 
for ENW and FAW and all the sets involved have finite measure (this is why 
we introduced W), we can subtract to obtain the conclusions for (E \ F) NW. 


iii. JV is closed under finite disjoint unions. (This is simply the additivity of the 
measures.) 


iv. M is closed under countable increasing unions, and hence (by (ii)) under count- 
able decreasing intersections. (This follows from the monotone convergence 
theorem.) 


Now, let € = {A \ B : A, B open in X x Y }, and let A be the collection of finite 
disjoint unions of sets in €. Since 


(Ay \ Bi) (A2 \ B2) = (A1 N A2) \ (B1 U B2), 
(A\B)} =[(X x Y)\ A] U [(An B) \ Ø], 


E is an elementary family, so by Proposition 1.7, A is an algebra. By Lemma 2.35, 
the monotone class generated by A coincides with the o-algebra generated by A, 
which is clearly Bx yy. But by (i)-(iv) (since A \ B = A\ (AM B)), M contains 
this monotone class, so M = Bryyy. 

Next, since u and v are o-finite and outer regular, we have X = Oba U,, and 
Y =U> Vn where U,, and V, are open and have finite measure, and we may assume 
that the sequences {U,,} and {V,,} are increasing. If Æ € Bx xy, the preceding 
argument shows that EN (Un x Vn) satisfies the conclusions of the theorem for all n, 
and the monotone convergence theorem then implies that Æ satifies the conclusions 
too. 

Finally, if & € Bx x By, by Tonelli’s theorem we have 


px o(E) = | (Bs) due) = u È v(E), 


and the proof is complete. E 


7.27 The Fubini-Tonelli Theorem for Radon Products. Let u and v be o-finite 
Radon measures on X and Y, and let f be a Borel measurable function on X x Y. 
Then f, and f¥ are Borel measurable for every x and y. If f È> 0, then z > f f, dv 
and y +> f f” dp are Borel measurable on X and Y. If f € L!(p X v), then 
fz € L! (v) for a.e. z, fY € L! (n) for a.e. y, and £ œ f frdv andy œ f f¥ dp 
are in L! (u) and L! (v). In both cases, we have 


[ fausy= [[ sana = | fdvdu. 
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Proof. The measurability of f+ and f” was established in Lemma 7.23. The rest 
of the proof is identical to the proof of the ordinary Fubini-Tonelli theorem, except 
that Theorem 7.26 is used in place of Theorem 2.36. E 


The extension of the notion of Radon products to any finite number of factors is 
straightforward. More interestingly, the theory can be extended to infinitely many 
factors provided that the spaces in question are compact and the measures on them 
are normalized to have total mass 1. 

To be precise, suppose that {Xa}aea is a family of compact Hausdorff spaces 
and, for each a, Ha is a Radon measure on Xa such that wa(Xq) = 1. Let 
X = | [c4 Xa, a compact Hausdorff space by Tychonoff’s theorem. We would 
like to define a Radon measure u on X such that if Ea is a Borel set in Xa for each 
a and Ea = Xa for all but finitely many a, then u(] Jaca Ea) = [oe 4 Ha (Ea). 
(The product on the right is well defined since all but finitely many factors are equal 
to 1.) A bit of notation will be helpful: Given qj,...,Q@, E A, let Tio ,...,an) be the 
natural projection from X onto | [} Xa;, 


Tiarna T) = (Tars eg Loy) 
Thus ee Co Ke K Bayh = ea Ea where Ea = Xa fora Æ 
Ql; -e3 Qne 


7.28 Theorem. Suppose that, for each a € A, ua is a Radon measure on the 
compact Hausdorff space Xa such that pal Xa) = 1. Then there is a unique Radon 
measure p on X = | [aca Xa such that for any a1,...,Qm E A and any Borel set 
E in|]; Xa; 


HUA cm) (EY) = (Her x eR Bon )(E). 


Proof. Let Cr(X) be the set of all f € C(X) that depend on only finitely many 
coordinates, that is, all f of the form f = g © 7(q,,...,.0,) for some Q1,...,Qn E A 
and g € C([]; Xa,). If f is such a function, we define 


KA) = | 9dlioy == R pag) 


Adding on some extra coordinates to the set a1,..., Qn has no effect on this formula 
since a(Xq) = 1 for all a. Thus J(f) is a well-defined positive linear functional 
on Cr(X), and |J(f)| < || f||. with equality when f is constant. 

Now, Cr (X) is clearly an algebra that separates points, contains constant func- 
tions, and is closed under complex conjugation, so by the Stone-Weierstrass theorem 
it is dense in C(X). Hence the functional J extends uniquely to a positive linear 
functional of norm 1 on C'(X), and the Riesz representation theorem therefore yields 
a unique Radon measure u on X such that J(f) = f f du forall f E€ Cr(X). 

Given aj,...,Qn E A, let Ufar, an) = HO Te carte Then (a4,...,a,) 18 a 
Borel measure on | [] Xa; that satisfies 


[iese = fo O Rlar, nan) du 
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when g is the characteristic function of a Borel set, and hence (by the usual linearity 
and approximation arguments) when g is any bounded Borel function. In particular, 
from the definition of p, for all g € C([]; Xa, ) we have 


[tiers = IE X- X ie) 


= an) ÍS Radon, the uniqueness of the Riesz representation 
will imply that f(a, ,...,an) = Har X a Han» Which will complete the proof. 

Let E be a Borel set in Ili X aj? and write 7 = = Mai e an) for short. Since 
u is regular, for any € > O there is a compact K C 2~1(E) such that p(K) > 
(nT (E))— e€. Then K’ = r(K) is a compact subset of Æ, and pucay.,..., 
w(K) > w(K) since K C 171 (K'), 80 p(a,..,0%)(K") > wl (E)) -€ = 
iiss al!) — €. Thus /4(q,,...,.a,) IS inner regular, and the same argument applied 
to E° shows that it is outer regular. Thus H(o,,... œn) is Radon, and we are done. g 


If we can show that (a, 


yeoeeyg 


Exercises 


28. If X is the set of ordinals less than or equal to the first uncountable ordinal wy, 
with the order topology, then By. x # Bx ® Bx. In fact, {(x,y):£ < y <u } is 
open but not in By @ Bx. (Reexamine Exercise 47 in §2.5 in the light of Exercise 
15 in 87.2.) 


29. If X is a set of cardinality > c with the discrete topology, then Bxxx #£ 
Bx © Bx. In fact, D = {(z,y) : £ = y} is closed but not in By 8 Bx. (Use 
Exercise 5 in §1.2 and Proposition 1.23. If D € Bx ® Bx, then D € M where 
M is a sub-o-algebra of By © Bx generated by a countable family of rectangles, 
hence D € N & N where N is a countably generated sub-o-algebra of Bx. Then 
{xr} = Dz E N for all x, but card(N) < c.) The same reasoning applies if X is 
replaced by its one-point compactification. 


30. Let u and v be Radon measures on X and Y, not necessarily o-finite. If f is a 
nonnegative LSC function on X x Y, then z > f fs dv and y > f f” dp are Borel 
measurable and f f d(u X v) = [f fdudv = ff f dv dp. 


31. Some results concerning Baire sets on product spaces: 
a. BX% y C BS @ BY. (Hint: Proposition 7.22 remains true if B is replaced 
by B°.) 
b. If X and Y are either compact or second countable, then B% y = BY @BY. 
c. If X is an uncountable set with the discrete topology, then BY y # 


BY © BY. 
7.5 NOTES AND REFERENCES 


87.1: The Riesz representation theorem is actually the work of many hands. F. Riesz 
[116] first proved it for the case X = [a, b] C R; he formulated the result in terms of 
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Riemann-Stieltjes integrals and used no measure theory. It was extended to compact 
subsets of R” by Radon [111], to compact metric spaces by Banach (see Saks [128]), 
and to compact Hausdorff spaces by Kakutani [80]. For the noncompact case, the 
first general results were obtained by Markov [98], who characterized positive linear 
functionals on BC (X) for a normal space X in terms of certain finitely additive set 
functions. A theorem essentially equivalent to Theorem 7.2 was apparently known 
to Bourbaki about 1940 (see Weil [158, 86]), but his treatment of integration was 
not published until 1952, by which time several others had obtained similar results. 
For more detailed references, see Dunford and Schwartz [35, 8IV.16] and Hewitt and 
Ross [75, §11]. See also König [86] for a generalization of the Riesz representation 
theorem to spaces that are not locally compact. 

Our use of the term “Radon measure,” which derives from Radon’s seminal paper 
[111], is common but not entirely standard. Some authors refer to such measures as 
“regular Borel measures”; others use the term “Radon measure” to mean a positive 
linear functional on C’,(X ), and still others define Radon measures to be inner regular 
rather than outer regular on all Borel sets. It should also be noted that some older 
texts define the Borel o-algebra to be the o-algebra generated by the compact sets, 
which is in general smaller than our Bx. 

If u is a Radon measure, let #2 denote the complete, saturated extension of p 
discussed at the end of §7.1. It is a significant fact that 7 is always decomposable 
in the sense of Exercise 15 in 83.2; see Hewitt and Stromberg [76, Theorem 19.30]. 
Consequently, the extension of the Radon-Nikodym theorem in that exercise and the 
fact that L! (jz) = L% (jz) (Exercise 25 in §6.2) are available. In this connection one 
should note that L? (Fr) is essentially identical to L? (u) for p < oo, by Propositions 
2.12 and 2.20. 


87.2 See Cohn [27, Proposition 7.2.3] for a proof of Theorem 7.8 that does not 
use the Riesz representation theorem. 

Propositions 7.12 and 7.14 suggest an alternative way of constructing the Radon 
measure p associated to a positive linear functional J on C,(X) in the spirit of the 
Daniell integral (see §2.8). Namely, one first extends J to nonnegative LSC functions 
g by setting 

I(g) =sup{I(f): f E€ CX), 0< f <g} 


and then extends J to arbitrary nonnegative functions h by setting 
I(h) = inf{I(g) : g LSC, g > h}. 


It is then not difficult to verify that if E C X, I(xz) = p*(E) where p* is the outer 
measure in the proof of Theorem 7.2. For details, see Hewitt and Ross [75] or Hewitt 
and Stromberg [76]. 

Kupka and Prikry [88] contains a readable discussion of some of the more advanced 
topics in the theory of measures on LCH spaces. 


87.3: Theorem 7.17 is frequently stated only for the case where X is compact; 
however, the more general formulation follows easily from the compact case by 
considering the one-point compactification of X. An interesting proof of the Baire 
measure version of this result, quite different from ours, can be found in Hartig [67]. 
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87.4: The Fubini-Tonelli theorem for Radon products, as presented here, is 
essentially due to deLeeuw [31]; see also Cohn [27, 87.6]. Another variant of this 
theorem, which includes some further results for the non-o-finite case, can be found 
in Hewitt and Ross [75, 813]. 

Theorem 7.28 is essentially due to Nelson [103]. There is also a purely measure- 
theoretic version of this result: If {(Xa, Ma, Ha)}aca is a family of measure 
spaces with o(Xq) = 1 for all a, one can define a product measure [[, pa on 
(IIa Xaa Ma). See Saeki [127], Halmos [62, §38], or Hewitt and Stromberg 
[76, 822]. The hypotheses of Theorem 7.28 are more restrictive, but the conclusion 
is stronger; in particular, the domain of the measure pz in Theorem 7.28 is the Borel 
o-algebra on | [., Xa, which is much larger than &), Bx, when the index set A is 
uncountable. 





Elements of Fourier 
Analysis 


It is easy to say that Fourier analysis, or harmonic analysis, originated in the work 
of Euler, Fourier, and others on trigonometric series; it is much harder to describe 
succinctly what the subject comprises today, for it is a meeting ground for ideas from 
many parts of analysis and has applications in such diverse areas as partial differential 
equations and algebraic number theory. Two of the central ingredients of harmonic 
analysis, however, are convolution operators and the Fourier transform, which we 
study in this chapter. 


8.1 PRELIMINARIES 


We begin by making some notational conventions. Throughout this chapter we 
shall be working on R”, and n will always refer to the dimension. In any measure- 
theoretic considerations we always have Lebesgue measure in mind unless we specify 
otherwise. Thus, if Æ is a measurable set in R”, we shall denote LP (E, m) by L?(E). 
If U is open in R” and k € N, we denote by C*(U) the space of all functions on 
U whose partial derivatives of order < k all exist and are continuous, and we set 
C>(U) = N7 C*(U). Furthermore, for any E C R” we denote by CX (E) the 
space of all C% functions on R” whose support is compact and contained in F. 
If & = R” or U = R”, we shall usually omit it in naming function spaces: thus, 
L? = L?(R”), Ck = C*(R"), Ce = C&%(R”). If z,y € R”, we set 


n 
z-y=) zy e| = Vora. 
1 
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It will be convenient to have a compact notation for partial derivatives. We shall 
write 


O 
0; = Aan 
j 
and for higher-order derivatives we use multi-index notation. A multi-index is an 
ordered n-tuple of nonnegative integers. If œ = (&1,...,@n) is a multi-index, we 


set 


n n ð a1 ð An 
= . t= | ee (pen a gears 
la| 2% a! [Tos ð (=) (a=) ; 


and if t = (isyan) € R”, 


n 
Q — Qj 
T PE 
1 


(The notation |a| = X` a; is inconsistent with the notation |z| = (>> rhe 2 but the 
meaning will always be clear from the context.) Thus, for example, Taylor’s formula 
for functions f € C* reads 


f= > (0% f)(z9) E—2" + R(x), lim tele). = 0, 


la|<k z>zo |£ — zo|" 


and the product rule for derivatives becomes 


(Exercise 1). 

We shall often avail ourselves of the sloppy but handy device of using the same 
notation for a function and its value at a point. Thus, “x®” may be used to denote the 
function whose value at any point z is x“. 

Two spaces of C% functions on R” will be of particular importance for us. The 
first is the space C’S° of C% functions with compact support. The existence of 
nonzero functions in C’o° is not quite obvious; the standard construction is based on 
the fact that the function 7(t) = e71 O) (t) is C® even at the origin (Exercise 
3). If we set 


2_94)-1) ; 
(8.1) (x) =n ~ fal?) = {oP a aie 


it follows that y € C'™, and supp(w) is the closed unit ball. In the next section we 
shall use this single function to manufacture elements of CSF in great profusion; see 
Propositions 8.17 and 8.18. 

The other space of C% functions we shall need is the Schwartz space ô consisting 
of those C% functions which, together with all their derivatives, vanish at infinity 
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faster than any power of |x|. More precisely, for any nonnegative integer N and any 
multi-index @ we define 


If ll.) = wt + |x|)"|O* f(2)]; 


then 
S= {f €C™:||flliw.a) < œ forall N, a}. 


Examples of functions in 8 are easy to find: for instance, falx) = reell? where 
a is any multi-index. Also, clearly C C ò. 

It is an important observation that if f € S, then 0° f € DP for all œ and all 
p € [1,00]. Indeed, |O% f(x)| < Cy(1+ |x|)~% for all N, and (1 + |x|)-" € LP 
for N > n/p by Corollary 2.52. 





8.2 Proposition. S is a Fréchet space with the topology defined by the norms 
|- Iva): 


Proof. The only nontrivial point is completeness. If {fx} is a Cauchy sequence in 
S, then || fj — fell(w.ay > Oforall N, a. In particular, for each a the sequence {0° fk} 
converges uniformly to a function ga. Denoting by e; the vector (0,...,1,...,0) 
with the 1 in the 7th position, we have 


f(a + tej) — felz) =j Oj fy(a + sej) ds. 


Letting k — oo, we obtain 


t 
go(x + tej) — go(z) =| Je;(£ + sej) ds. 
0 


The fundamental theorem of calculus implies that ge, = O;90, and an induction on 
|a| then yields 9a = O% go for all a. It is then easy to check that || fk — 9o|l(v7,a) — 0 
for all a. g 


Another useful characterization of 5 is the following. 


8.3 Proposition. Jf f € C™, then f € S iff x°0°f is bounded for all multi-indices 
a, B iff ©” (xf f) is bounded for all multi-indices a, p. 


Proof. Obviously |xz°| < (1+|z|)% for |3| < N. On the other hand, X7 |z;| 
is strictly positive on the unit sphere |x| = 1, so it has a positive minimum 6 there. It 
follows that 57 |xj|" > 6|z|% for all x since both sides are homogeneous of degree 
N, and hence 


Cela 22" (1+ le") ani +86 El ee aa al 
1 IBISN 
This establishes the first equivalence. The second one follows from the fact that each 
8% (xP f) is a linear combination of terms of the form 219° f and vice versa, by the 
product rule (Exercise 1). E 
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We next investigate the continuity of translations on various function spaces. The 
following notation for translations will be used throughout this chapter and the next 
one: If f is a function on R” and y € R”, 


Tyf (t) = f(x = y). 


We observe that ||, flle = ||f||p for 1 < p < œ and that ||7, flle = ||fllu. A 
function f is called uniformly continuous if |r, f — f||. — 0 as y — 0. (The 
reader should pause to check that this is equivalent to the usual e-ô definition of 
uniform continuity.) 


8.4 Lemma. Iff € C.(IR”), then f is uniformly continuous. 


Proof. Given € > 0, for each x € supp(f) there exists 6, > O such that 
|f(x—y)-f (x)| < te if |y| < 6,. Since supp( f ) is compact, there exist £1,..., EN 
such that the balls of radius 55x , about x; cover supp(f). If 6 = t min{éz, }, then, 
one easily sees that |r, f — flu < € whenever |y| < 6. E 


8.5 Proposition. Jf 1 < p < œ, translation is continuous in the LP norm; that is, if 
f € LP and z € R”, then lim,o ||Ty+2f — T2 f lp = 0. 


Proof. Since Ty+z = TyTz, by replacing f by 7, f it suffices to assume that z = 0. 
First, if g € Ce, for |y| < 1 the functions 7,,g are all supported in a common compact 
set K, so by Lemma 8.4, 


J Borer cine aes ono 


Now suppose f € LP. Ife > 0, by Proposition 7.9 there exists g E€ C with 
la — fllp < €/3, so 


Tyf = Tile < [Ty (f E g)llp + Tyg — g|lp + |lg — fll < 2e + Tyg = glo 


and ||7,9 — 9|lp < €/3 if y is sufficiently small. E 


Proposition 8.5 is false for p = oo, as one should expect since the L°° norm is 
closely related to the uniform norm; see Exercise 4. 

Some of our results will concern multiply periodic functions in R”, and for 
simplicity we shall take the fundamental period in each variable to be 1. That is, we 
define a function f on R” to be periodic if f(z + k) = f(x) for all x € R” and 
k € Z”. Every periodic function is thus completely determined by its values on the 
unit cube 

Q = [-3:3) 
Periodic functions may be regarded as functions on the space R” /Z” & (R/Z)” of 
cosets of Z”, which we call the n-dimensional torus and denote by T”. (When 
n = 1 we write T rather than Tt.) T” is a compact Hausdorff space; it may be 
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identified with the set of all z = (z1,..., 2n) E C” such that |z;| = 1 for all j, via 
the map 

CE DE N =e): 
On the other hand, for measure-theoretic purposes we identify T” with the unit cube 
Q, and when we speak of Lebesgue measure on T” we mean the measure induced on 
T” by Lebesgue measure on Q. In particular, m(T”) = 1. Functions on T” may be 
considered as periodic functions on R” or as functions on Q; the point of view will 


be clear from the context when it matters. 

Exercises 

1. Prove the product rule for partial derivatives as stated in the text. Deduce that 
O (aP f) = 120 f +Y cyr, POf = Olf) + Y 0S) 

for some constants cy and cs with cys = c} = 0 unless |y| < |a| and |ô] < |6]. 


2. Observe that the binomial theorem can be written as follows: 


(£1 + £2)" = ` —r*| (z = (ith a= (a1, Q2)). 


Prove the following generalizations: 
a. The multinomial theorem: If z € R”, 


3. Letn(t) = e7!/* for t > 0, n(t) = 0 fort < 0. 
a. For k € N and t > 0, 7'*)(t) = Py(1/t)e7!/* where Pk is a polynomial of 
degree 2k. 
b. 7‘*)(0) exists and equals zero for all k € N. 


4. Iff eL” and ||7,f — fll —> Oas y — 0, then f agrees a.e. with a uniformly 
continuous function. (Let A, f be as in Theorem 3.18. Then A, jf is uniformly 
continuous for r > 0 and uniformly Cauchy as r — 0.) 


8.2 CONVOLUTIONS 


Let f and g be measurable functions on R”. The convolution of f and g is the 
function f x g defined by 


f +9(z) = I f(a — y)g(y) dy 
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for all x such that the integral exists. Various conditions can be imposed on f and 
g to guarantee that f * g is defined at least almost everywhere. For example, if f is 
bounded and compactly supported, g can be any locally integrable function; see also 
Propositions 8.7-8.9 below. 

In what follows, we shall need the fact that if f is a measurable function on R”, 
then the function K(x, y) = f(x —y) is measurable on R” x R”. We have K = fos 
where s(x, y) = x — y; Since s is continuous, K is Borel measurable if f is Borel 
measurable. This can always be assumed without affecting the definition of f *« g, by 
Proposition 2.12. However, the Lebesgue measurability of K also follows from the 
Lebesgue measurability of f; see Exercise 5. 

The elementary properties of convolutions are summarized in the following propo- 
sition. 


8.6 Proposition. Assuming that all integrals in question exist, we have 
a fxg=grf, 
b. (feg)eh=fx*(g*h). 
¢: Forze R af #9) = (rfg =f ¥ (729). 


d. If Ais the closure of {x+y : x € supp(f), y E€ supp(g)}, then supp(f *g) C 
A. 


Proof. (a) is proved by the substitution z = x — y: 


2) = | se-wolw)dy = | fe)gle - 2) dz = 9 fle) 


(b) follows from (a) and Fubini’s theorem: 


ftro sh ) «h(x = J] to) jrendu 
= || toale -y -z)h E Gee. 
As for (c), 
a= | e-z) g(y )dy = | rafle- VIU) dy = (7f) * 92), 
and by (a), 


T2(f * g) =72(9* f) = (729) * f = f * (729). 


For (d), we observe that if z ¢ A, then for any y € supp(g) we have x—y ¢ supp(f); 
hence f(x — y)g(y) = 0 for all y, so f * g(x) = 0. E 


The following two propositions contain the basic facts about convolutions of LP 
functions. 


8.7 Young’s Inequality. If f € L! and g € L? (1 < p < œ), then f * g(x) exists 
for almost every x, f x g € LP, and || f * gllp < Ifl llgllp- 
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Proof. This is a special case of Theorem 6.18, with K(z,y) = f(x — y). 
Alternatively, one can use Minkowski’s inequality for integrals: 


I+ alle = || f FOE- oao], < f Ovalo dv = lalla 


8.8 Proposition. If p and q are conjugate exponents, f € LP, and g € L3, then 
f xg(x) exists for every x, f » g is bounded and uniformly continuous, and || f *g||u < 
lFAllollalla. I1 < p< œ (so that 1 <q < œ also), then f * g € Co(R”). 


Proof. The existence of f * g and the estimate for || f * g|| follow immediately 
from Hölder’s inequality. In view of Propositions 8.5 and 8.6, so does the uniform 
continuity of f * g: If 1 < p < œ, 


Ity(F * 9) — F * gle = Iyf = f) * glloo < luf — filpliglla > Oas y > 0. 


(If p = œ, interchange the roles of f and g.) Finally, if 1 < p,q < oo, choose 
sequences { fn } and {gn } of functions with compact support such that || fn — f ||p — 0 
and ||9n —g||g — 0. By Proposition 8.6d and what we have just proved, fn * gn € Ce. 
But 





fn *9n — f * llu < Ifa = filpllgnila + WF llpllgn — Illa = 0, 
so f x g E Co by Proposition 4.35. E 


The preceding results are all we shall use, but for the sake of completeness we 
state also the following generalization. 


8.9 Proposition. Suppose 1 < p,q,r < œ and p`! + q7! = r7! +1. 
a. (Young’s Inequality, General Form) Zf f € LP and g €E L3, then f xg € L" 
and || f * gll- < \lFllpllglle- 
b. Suppose also that p > 1, q > 1, andr < œ. If f € LP and g € weak L3, then 
f*g E L" and || f *g\|- < Coal fllplg]g where Cog is independent of f and g. 
c. Suppose that p = 1 and r = q > 1. If f € L! and g € weak L4, then 
f * g € weak L4 and |f * g] < Cq||f\l1, where C, is independent of f and g. 








Proof. To prove (a), let q be fixed. The special cases p = 1, r = q and 
p = q/(q — 1), r = œ are Propositions 8.7 and 8.8. The general case then follows 
from the Riesz-Thorin interpolation theorem. (See also Exercise 6 for a direct proof.) 
(b) and (c) are special cases of Theorem 6.36. E 


One of the most important properties of convolution is that, roughly speaking, 
f * g is at least as smooth as either f or g, because formally we have 


8° (f + g)(1) = 6 / E TE / 8° f(x — y)g(y) dy = (8° f) * (2), 
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and similarly 0°(f xg) = f *(O0%q). To make this precise, one needs only to impose 
conditions on f and g so that differentiation under the integral sign is legitimate. One 
such result is the following; see also Exercises 7 and 8. 


8.10 Proposition. If f € L!, g e OF, and 0%g is bounded for |a| < k, then 
feGE Cr and O° (f xg) = 7-4 (0%) for |a.| < k. 


Proof. This is clear from Theorem 2.27. a 


8.11 Proposition. If f,g € 5, then f *g € ©. 
Proof. First, f xg € C™ by Proposition 8.10. Since 
(8.12) 1+ |2)<1+|e—y|+|yl s+ |e—- yl) + |yl), 


we have 
(1+ lel)" 18°F * 9)(2)| < / (1+ fe — yl)" lor fle- pI + lal) Lay) ay 
r E / (1+ [yl)-?-! ay, 


which is finite by Corollary 2.52. E 


Convolutions of functions on the torus T” are defined just as for functions on R”. 
(If one regards functions on T” as periodic functions on R”, of course, the integration 
is to be extended over the unit cube rather than R”.) All of the preceding results 
remain valid, with the same proofs. 

The following theorem underlies many of the important applications of convolu- 
tions on R”. We introduce a bit of notation that will be used frequently hereafter: If 
@ is any function on R” and t > 0, we set 


(8.13) pelr) = t otir). 
We observe that if ø € Lt, then f Qı is independent of t, by Theorem 2.44: 


fo = JaA dr = ILOLE fo 


Moreover, the “mass” of ¢; becomes concentrated at the origin as t — 0. (Draw a 
picture if this isn’t clear.) 


8.14 Theorem. Suppose ¢ € L’ and f (x) dx = a. 
a If f € LP (1 < p< œ), then f x dk — af in the LP norm as t — 0. 
b. If f is bounded and uniformly continuous, then f x ¢, — af uniformly as 
t — 0. 
c. If f €e L” and f is continuous on an open set U, then f x 6, — af uniformly 
en compact subsets of U as t — 0. 
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Proof. Setting y = tz, we have 


f+ x(t) — af (2) = / [fæ — y) — F(a) oe) dy 


Apply Minkowski’s inequality for integrals: 


If * -afl < / reef — Fllelo(2)| az. 


Now, ||t:z/ — f||p is bounded by 2||f||p and tends to O as t — 0 for each z, by 
Proposition 8.5. Assertion (a) therefore follows from the dominated convergence 
theorem. 

The proof of (b) is exactly the same, with || - ||p replaced by ||- ||u. The estimate for 
|| f * 6: —af||, is obvious, and ||72 f — f ||, — 0 as t — 0 by the uniform continuity 
of f. 

As for (c), given € > 0 let us choose a compact E C R” such that fpe || < €. 
Also, let K be a compact subset of U. If t is sufficiently small, then, we will have 
x —tz € U forall z € K and z E€ E, so from the compactness of K it follows as in 
Lemma 8.4 that 


sup |f(z—tz) —f(z)| <€ 
xE€K, zE€E 


for small t. But then 


sup |f + dele) -afl < sup | f + f | ife- t - feo|lo ae 


reEek 


<e J ll + ll fllooe, 


from which (c) follows. E 


If we impose slightly stronger conditions on ¢, we can also show that f * pt > af 
almost everywhere for f € LP. The device in the following proof of breaking up an 
integral into pieces corresponding to the dyadic intervals [2*,2*+*] and estimating 
each piece separately is a standard trick of the trade in Fourier analysis. 


8.15 Theorem. Suppose |¢(x)| < C(1+|z|)~"~* for some C, € > 0 (which implies 
that ġ € L! by Corollary 2.52), and folz) dx =a. If f € LP (1 < p< œ), then 
f *dt(z) — af (x) as t — 0 for every x in the Lebesgue set of f — in particular. 
for almost every x, and for every x at which f is continuous. 





Proof. Ifa isin the Lebesgue set of f, for any 6 > 0 there exists n > 0 such that 


(8.16) J \f(z —y) — f(z)| dy < ôr” forr <n. 
ly|<r 
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Let us set 


i= i e= pe aiiay, 


ie [ f(e — y) — f(2)| lbe(y)| dy. 


We claim that J; is bounded by Aé where A is independent of t, whereas [2 — 0 as 
t — 0. Since 
lf*¢é(r) -af(e)| <h +h, 
we will have 
lim sup |f * de(z) — af (x)| < A6, 
t— 


and since 6 is arbitrary, this will complete the proof. 

To estimate J}, let K be the integer such that 2" < n/t < 24+! if n/t > 1, 
and K = 0 if 7/t < 1. We view the ball |y| < 7 as the union of the annuli 
2-Fn < |y| < 2!-*n (1 < k < K) and the ball |y| < 2-7. On the kth annulus we 
use the estimate 


lPe(y)| < Ct” 





Ene —k —n—eE 
a sor” h J 
t t 


and on the ball |y| < 2~/*7 we use the estimate |¢,(y)| < Ct~”. Thus 


E yer a pa | Fa — v) — f(z) dy 


~En<ly|<21-Fy 
orn f e-u- Fle) ay. 
ly|<2-* n 


Therefore, by (8.16) and the fact that 2% < n/t < 2%+1, 
K 2-kn —n—-e 
D <CéS (2'-*n)"t-” | —— Cory 
Scotty [E] + cana 
—e £ —K, 7" 
wae ha k 2- n 
ra Éro 
cêl- 2 +C | | 


es n= WK+1)e _ 9€ 2-Ky t% 
-2 ca |7] x +08 | 
< 2” C [25 (25 — 1)71 + 16. 


As for Í>, if p’ is the conjugate exponent to p and x is the characteristic function of 
{y : |y| > 7}, by Hölder’s inequality we have 


ine | (fe — y)| + oai 
ly|>n 
< Il fllplix¢ellr reie 
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so it suffices to show that for 1 < q < oo, and in particular for q = 1 and q = p’, 
IX Ptl|q — 0 as t — 0. If q = on, this is obvious: 


Ida SCHL + (n/t) = CE ee MES CE 


If q < oo, by Corollary 2.51 we have 


Ixe = | a t-"116(t7!y)|4 dy = ileal | \o(z)|4 dz 
yl>n 


|z|>n/t 
oO 


< geta pr—1—(nte)q Ape Co a ar z Cat, 
= t 
n/t 


In either case, ||x¢z||, is dominated by t£, so we are done. E 


In most of the applications of the preceding two theorems one has a = 1, although 
the case a = Ois also useful. Ifa = 1, {ġ:}:>0 is called an approximate identity, as 
it furnishes an approximation to the identity operator on L? by convolution operators. 
This construction is useful for approximating LP functions by functions having 
specified regularity properties. For example, we have the following two important 
results: 


8.17 Proposition. C (and hence also S) is dense in LP (1 < p < co) and in Co. 


Proof. Given f € L? and e > 0, there exists g E€ Ce with || f — g||p < €/2, by 
Proposition 7.9. Let ¢ be a function in C3° such that f ¢ = 1 — for example, take 
p= a wth where w is as in (8.1). Then g * ġ E C'S by Propositions 8.6d and 
8.10, and ||g * ¢ — g||p < €/2 for sufficiently small t by Theorem 8.14. The same 
argument applies if LP is replaced by Co, || - ||p by || - |lu, and Proposition 7.9 by 
Proposition 4.35. E 


8.18 The C% Urysohn Lemma. If K C R” is compact and U is an open set 
containing K, there exists f € CV such that O < f < 1, f = 1 on K, and 
supp(f) C U. 


Proof. Let 6 = p(K,U°®) (the distance from K to U‘, which is positive since 
K is compact), and let V = {x : p(x, K) < 6/3}. Choose a nonnegative ¢ € Co? 
such that fọ = 1 and $(x) = 0 for |z| > 6/3 (for example, (f Y)“ tps; with Y 
as in (8.1)), and set f = xy * ¢. Then f € CS by Propositions 8.6d and 8.10, and 
it is easily checked that 0 < f < 1, f = 1 on K, and supp(f) C {z : p(x, K) < 
26/3} C U. E 


Exercises 


5. Ifs: R” x R” — R” is defined by s(z, y) = x — y, then s7! (E) is Lebesgue 
measurable whenever F is Lebesgue measurable. (For n = 1, draw a picture of 
s~!(E) C R°. It should be clear that after rotation through an angle 7/4, s7! (E) 
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becomes F x R where F = {x : 2 € E}, and Theorem 2.44 can be applied. The 
same idea works in higher dimensions.) 


6. Prove Theorem 8.9a by using Exercise 31 in 86.3 to show that 


fe g(a)” < IFIP / f(y) Pla(a — y) |? dy. 


7. If f is locally integrable on R” and g € C* has compact support, then f+g € C*. 
8. Suppose that f € LP(R). If there exists h € LP (R) such that 


lim [|y= (ryf = f) = hl, = 0 


we call A the (strong) L? derivative of f. If f € L?(IR”), LP partial derivatives of f 
are defined similarly. Suppose that p and q are conjugate exponents, f € LP, g € L3, 
and the L? derivative 0; f exists. Then 0;(f * g) exists (in the ordinary sense) and 


equals (ð; f) * g. 


9. Iff €e LP(R), the L? derivative of f (call it h; see Exercise 8) exists iff f is 
absolutely continuous on every bounded interval (perhaps after modification on a null 
set) and its pointwise derivative f’ is in LP, in which case h = f’ a.e. (For “only if,” 
use Exercise 8: If g € Ce with f g = 1, then f * ge — f and (f xg} > hast 30. 
For “if, write 


flety) —f{z) | i. ma 7 iy = Piy 
ae e= fiery f'(2)] dt 


and use Minkowski’s inequality for integrals.) 


10. Let ¢ satisfy the hypotheses of Theorem 8.15. If f € LP (1 < p < oo), define 
the @-maximal function of f to be Mg f(r) = sup;so |f * $:(x)|. (Observe that 
the Hardy-Littlewood maximal function H f is My|f| where ¢ is the characteristic 
function of the unit ball divided by the volume of the ball.) Show that there is 
a constant C, independent of f, such that Myf < C - Hf. (Break up the integral 
f f(x—y)de(y) dy as the sum of the integrals over |y| < tandover 2*t < |y| < 2*+1¢ 
(k = 0,1, 2,...), and estimate ¢; on each region.) It follows from Theorem 3.17 that 
Mg is weak type (1,1), and the proof of Theorem 3.18 can then be adapted to give an 
alternate demonstration that f + p — (f ¢)f a.e. 


11. Young’s inequality shows that L! is a Banach algebra, the product being convo- 
lution. 
a. If J is an ideal in the algebra Lt, so is its closure in L}. 
b. If f € L}, the smallest closed ideal in Lt containing f is the smallest closed 
subspace of L! containing all translates of f. (If g € Ce, f * g(x) can be 
approximated by sums 5) f(x — y;)9(y;)Ay,;. On the other hand, if {ġ+} is an 
approximate identity, f * T,(¢t) > Tyf ast — 0.) 
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8.3 THE FOURIER TRANSFORM 


One of the fundamental principles of harmonic analysis is the exploitation of sym- 
metry. To be more specific, if one is doing analysis on a space on which a group acts, 
it is a good idea to study functions (or other analytic objects) that transform in simple 
ways under the group action, and then try to decompose arbitrary functions as sums 
or integrals of these basic functions. 

The spaces we are studying are R” and T”, which are Abelian groups under addi- 
tion and act on themselves by translation. The building blocks of harmonic analysis 
on these spaces are the functions that transform under translation by multiplication 
by a factor of absolute value one, that is, functions f such that for each z there is a 
number (x) with |f(x)| = 1 such that f(y + x) = o(x) f(y). If f and ¢ have this 
property, then f(z) = ¢(x)f(0), so f is completely determined by ¢ once f (0) is 
given; moreover, 


d(x) O(y) f(0) = d(x) f(y) = f(x +y) = O(a + y)f (0), 


so that (unless f = 0) d(2 + y) = 6(x)G(y). In short, to find all f’s that transform 
as described above, it suffices to find all ¢’s of absolute value one that satisfy the 
functional equation ¢(z2 + y) = ¢(z)¢(y). Upon imposing the natural requirement 
that o should be measurable, we have a complete solution to this problem. 


8.19 Theorem. If ¢ is a measurable function on R” (resp. T”) such that (x+y) = 
b(x)¢(y) and |b| = 1, there exists € € R” (resp. € € T”) such that (x£) = e?™8*, 


Proof. We first prove this assertion on R. Let a € R be such that ie p(t) dt 0; 
such an a surely exists, for otherwise the Lebesgue differentiation theorem would 
imply that @ = 0 a.e. Setting A = (fo p(t) dt)—1, then, we have 


zr+a 


g(r) = af p(x)ọ(t)dt = A [ p(x + t) dt = aj p(t) dt. 


Thus ¢, being the indefinite integral of a locally integrable function, is continuous; 
and then, being the integral of a continuous function, it is C 1 Moreover, 


¢'(x) = A[¢(z + a) — 6(x)] = B(x), where B = A[d(a) — 1]. 


It follows that (d/dxr)(e~?*¢(zx)) = 0, so that e~?*4(z) is constant. Since ¢(0) = 
1, we have ¢(x) = e?*, and since |¢| = 1, B is purely imaginary, so B = 27i€ 
for some € € R. This completes the proof for R; as for T, the @ we have been 
considering will be periodic (with period 1) iff e?7*4 = 1 iff £ € Z. 

The n-dimensional case follows easily, for if e,,..., e, is the standard basis for 
R”, the functions ~;(t) = (tej) satisfy ~j;(t + s) = y; (t)ẹy;(s) on R, so that 
i(t) = e?7*52", and hence 





n 


A 12 23e3) =|] vy(a3) es, 
1 


1 
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The idea now is to decompose more or less arbitrary functions on T” or R” in 
terms of the exponentials e?7*4*. In the case of T” this works out very simply for 
L? functions: 


8.20 Theorem. Let E,,(2) = ett, Then {E,, : K € Z” } is an orthonormal basis 
of L?(T”). 


Proof. Verification of orthonormality is an easy exercise in calculus; by Fubini’s 
theorem it boils down to the fact that h e?Tikt dt equals 1 if k = 0 and equals 0 
otherwise. Next, since EkEa = Ex,+), the set of finite linear combinations of the 
E,,’s is an algebra. It clearly separates points on T”; also, Ho = 1 and Eù, = E_,. 
Since T” is compact, the Stone- Weierstrass theorem implies that this algebra is dense 
in C(T”) in the uniform norm and hence in the L? norm, and C(T”) is itself dense 
in L?(T”) by Proposition 7.9. It follows that {E,,} is a basis. E 


To restate this result: If f € L?(T”), we define its Fourier transform f, a 
function on Z”, by 


fis) = Uf, Ex) = f PO 


and we call the series 


the Fourier series of f. The term “Fourier transform” is also used to mean the map 
f ++ f. Theorem 8.20 then says that the Fourier transform maps L?(T”) onto l? (Z”), 
that fll = || f||2 (Parseval’s identity), and that the Fourier series of f converges to 
f in the L? norm. We shall consider the question of pointwise convergence in the 
next two sections. R Ji 

Actually, the definition of f (x) makes sense if f is merely in L! (T”), and |f («x)| < 
\|f||1, so the Fourier transform extends to a norm-decreasing map from L1(T”) to 
1°°(Z"). (The Fourier series of an L+ function may be quite badly behaved, but there 
are still methods for recovering f from f when f € Lt, as we shall see in the next 
section.) Interpolating between L! and L?, we have the following result. 


8.21 The Hausdorff- Young Inequality. Suppose that 1 < p < 2 and q is the 
conjugate exponent to p. If f € L?(T”), then Fe 17(Z”") and lf llq < | Pe 


Proof. Since ||flloo < ||fl|, and ||fllo = ||fll2 for f € L! or f € L?, the 
assertion follows from the Riesz-Thorin interpolation theorem. E 
The situation on R” is more delicate. The formal analogue of Theorem 8.20 


should be 


f(z)= | FEE dé, where f(E)= | f(x)e 2" de. 


R?” R” 
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These relations turn out to be valid when suitably interpreted, but some care is needed. 
In the first place, the integral defining F(E) is likely to diverge if f € L?. However, it 
certainly converges if f € L+. We therefore begin by defining the Fourier transform 
of f € L'(R”) by 


~ 


FFE) = F(E) = (1)e7?EE qy, 


Rr 


(We use the notation F for the Fourier transform only in certain situations where it 
is needed for clarity.) Clearly || f||u < ||f||1, and f is continuous by Theorem 2.27; 
thus 


F: L'(R") — BC(R"). 
We summarize the elementary properties of F in a theorem. 


8.22 Theorem. Suppose f,g € L'(R”). 
a. (Tyf) (E) = e7?" f(E) and T(f) = h where h(x) = e?** f(z), 
b. IfT is an invertible linear transformation of R” and S = (T*)~? is its inverse 
transpose, then (f o TY = |det T|- f o S. In particular, if T is a rotation, 


~ ~ 


then (f oT) = fo T; and ifTx =t~'z (t > 0), then (f oT) (£) = t” f(t), 


m~ ~ 


so that ( ft) (€) = f (tE) in the notation of (8.13). 
c. (f*9) = f9. 
d. If x£“ f € L! for |a| < k, then fe C* and O° f = [(—2riz)® f). 
e. If f € C*, 8f e L! for la| < k, and O°f € Co for |a| < k — 1, then 


m~ m~ 


(Of) (E) = (277) f (€). 
f. (The Riemann-Lebesgue Lemma) F(L! (R”)) C Co(R”). 


Proof. a. We have 
(yA) = f fle— we s a a T 


and similarly for the other formula. 
b. By Theorem 2.44, 


m~ 


(Pol) Ce) = pita dx = |detT|~' ] far dx 


= des TI! | FeS de = | det TI FISO. 
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c. By Fubini’s theorem, 


= J f(a — y)gly)e ?™ E? dy dz 


=| f Fa e ie (zr— “Waly Je —27rit-y dx dy 
ss has (ye PSY dy 
= f(é)9(E). 


d. By Theorem 2.27 and induction on |a], 


= og f f(aje?r** de = | f(2)(—2nia)re 2 dx. 


e. First assume n = |a| = 1. Since f € Co, we can integrate by parts: 


[fae dr = eT Ti: z| [fo —Qrié)e —2771E-x dr 
= mig f (£). 


The argument for n > 1, |a| = 1 is the same — to compute (0, f j; integrate by parts 
in the 7th variable — and the general case follows by induction on |a]. 

f. By (e), if f € C! A Ce, then EIFE) is bounded and hence f € Co. But the set 
of all such f’s is dense in L! by Proposition 8.17, and A —> f uniformly whenever 
fa —> f in L}. Since Cp is closed in the uniform norm, the result follows. E 


Parts (d) and (e) of Theorem 8.22 point to a fundamental property of the Fourier 
transform: Smoothness properties of f are reflected in the rate of decay of f at 
infinity, and vice versa. Parts (a), (c), (e), and (f) of this theorem are valid also on 
T”, as is (b) provided that T leaves the lattice Z” invariant (Exercise 12). 


8.23 Corollary. F maps the Schwartz class $ continuously into itself. 


Proof. If f € S, then z288 f € L! N Co forall a, 8, so by Theorem 8.22d,e, f 
is C% and 
GO P ae eee 


Thus 0° (£8 f) is bounded for all a, 3, whence f € S by Proposition 8.3. Moreover, 
since f (1 + |x|) 7”7} dr < œ, 


| (w%0* F) |lu < £28 fila < CI + |e) t28? fa. 


It then follows that IFoo < Cu. X iyi<jo Fil +n+1,7) by the proof of Propo- 
sition 8.3, so the Fourier transform is continuous on ô. E 
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At this point we need to compute an important specific Fourier transform. 


8.24 Proposition. If f(x) = e~"!7|" where a > 0, then f(E) = a7™ 2em" /a, 


Proof. First consider the case n = 1. Since the derivative of e-Tax” ig 


—2nae-™ , by Theorem 8.22d,e we have 


o_o ~ 


(PIE) = (-2mize "t SE) = “FTO = Ori Fle) = -TERE 


a 


It follows that (d/dé)(e™’/* F(£)) = 0, so that e7£*/a F(€) is constant. To evaluate 
the constant, set € = 0 and use Proposition 2.53: 


f(0) = a dx =a7*/?, 


The n-dimensional case follows by Fubini’s theorem, since |z|? = $D} 27: 


FÈ) = [I / exe(-rar3 = 2m ejt) dr; 
1 


j l] la“? exp(—mé}/a)| = a-"/? exp(—né|?/a). 


We are now ready to invert the Fourier transform. If f € Lt, we define 


Pus cos J FEE? a€, 


m~ 


and we claim that if f € L! and f € L! then (f)Y = f. A simple appeal to Fubini’s 
theorem fails because the integrand in 


AY (a) = ff sta Prenerre* dy dg 


is not in L}(R” x R”). The trick is to introduce a convergence factor and then pass 
to the limit, using Fubini’s theorem via the following lemma: 


8.25 Lemma. Jf f,g € L! then {fo = f f3. 


Proof. Both integrals are equal to [f f(x)g(£)e7?™S® dz dë. E 


8.26 The Fourier Inversion Theorem. if f € L? and f € L', then f agrees almost 


m~ 


everywhere with a continuous function fo, and (f)Y = (fY) = fo- 
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Proof. Given t > 0 and z € R”, set 


P(E) = exp(Qig - x — mt? |€|?). 


By Theorem 8.22a and Proposition 8.24, 


~ 


ply) = t” exp(—a|x — y|?/t?) = gl£ — y), 


where g(x) = e~™lzl” and the subscript t has the meaning in (8.13). By Lemma 
8.25, then, 


/ T aa x FE) de = J fọ = / f$ = f * p(T). 


Since fener dx = 1, by Theorem 8.14 we have f * 9; — f in the L! norm as 
t — 0. On the other hand, since f € L! the dominated convergence theorem yields 


fig fer WP O dé = ati dé = (f)Y (2). 


t—0 


“~~ m~ m~ m~ 


It follows that f = ( f)“ a.e., and similarly (fY) = f a.e. Since ( f)“ and (fY) are 
continuos, being Fourier transforms of L! functions, the proof is complete. E 


8.27 Corollary. If f € L! and f = 0, then f = 0 a.e. 
8.28 Corollary. F is an isomorphism of S onto itself. 


Proof. By Corollary 8.23, F maps 5 continuously into itself, and hence so does 


fw f“, since fY(x) = f(—x). By the Fourier inversion theorem, these maps are 
inverse to each other. E 


At last we are in a position to derive the analogue of Theorem 8.20 on R”. 


8.29 The Plancherel Theorem. If f € L! N L?, then f € L?; and ¥\(L! A L?) 
extends uniquely to a unitary isomorphism on L?. 


Proof. Let X ={fe L: TE L!}. Since fe L! implies f € L°, we have 
X C L? by Proposition 6.10, and X is dense in L? because $ C X and S is dense in 


~ 


L? by Proposition 8.17. Given f, g € X, let h = J. By the inversion theorem, 


he = f eiae = | Ejde = FE) 


Jn=|ñ=|ħ=f 


Hence, by Lemma 8.25, 
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Thus F|X preserves the L? inner product; in particular, by taking g = f, we obtain 
filo = | flo. Since F(X) = X by the inversion theorem, F|X extends by continuity 
to a unitary isomorphism on L?. 

It remains only to show that this extension agrees with F on all of L! N L?. But 
if f € L! OL? and g(r) = ell” as in the proof of the inversion theorem, we 
have f xg E€ L! by Young’s inequality and (f * Ot a E L! because (f * g O = 
eT IEI? F(E) and fi is bounded. Hence f « ge € X; moreover, by Theorem 8.14, 
f xg — f in both the L! and L? norms. Therefore (f * ge) — f both uniformly 
and in the L? norm, and we are done. E 


We have thus extended the domain of the Fourier transform from L! to L! + 
L?. Just as on T”, the Riesz-Thorin theorem yields the following result for the 
intermediate L? spaces: 


8.30 The Hausdorff- Young Inequality. Suppose that 1 < p < 2 and q is the 


conjugate exponent to p. If f € L?(IR”), then f € L4(IR") an iil < Ilp 


If f € L! and f € L}, the inversion formula 


= Orig 


exhibits f as a superposition of the basic functions e it is often called the 
Fourier integral representation of f. This formula remains valid in spirit for all 
f € L?, although the integral (as well as the integral defining f) may not converge 
pointwise. The interpretation of the inversion formula will be studied further in the 
next section. 

We conclude this section with a beautiful theorem that involves an interplay of 
Fourier series and Fourier integrals. To motivate it, consider the following problem: 
Given a function f € L1(IR”), how can one manufacture a periodic function (that is, 
a function on T”) from it? Two possible answers suggest themselves. One way is to 
“average” f over all periods, producing the series ‘pega f(x — k). This series, if it 
converges, will surely define a periodic function. The other way is to restrict f to the 
lattice Z” and use it to form a Fourier series p ezn firj e, The content of the 
following theorem is that these methods both work and both give the same answer. 


2miE- r. 


8.31 Theorem. /f f € L!(R”), the series X` kezn Tk f converges pointwise a.e. and 
in L*(T") to a function Pf such that lP Fla < |ifllı Moreover, fork E€ Z”, 


(Pf y (K) (Fourier transform on T”) equals f f(k ) (Fourier transform on R”). 


Proof. Let Q = [-3,5)”. 
{c+k:2E€Q},keZ",so 


[ È te-» j| dz = 
Q kez 


Then R” is the disjoint union of the cubes Q + k = 


D J, velar= [stele 


kez” 
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Now apply Theorem 2.25. First, it shows that the series ` 7 f converges a.e. and 
in L'(T”) to a function Pf € L'(T”) such that |/Pf|l1 < || fla, since T” is 
measure-theoretically identical to Q. Second, it yields 


J me T pa eT TiK dr = `S J Tre TEFEN dz 
Q kezn “Q+k 


keEZn 


(Pf) (K) 


fae" dre) Fae ede iG). 
keZN Y Q+k ne 
il 


If we impose conditions on f to guarantee that the series in question converge 
absolutely, we obtain a more refined result. 


8.32 The Poisson Summation Formula. Suppose f € C(R”) satisfies |f(x)| < 
C(1 + |x|)~"~£ and |f (€)| < C(1 + |€|)~"~€ for some C, € > 0. Then 


> f(ctk)= y f(k E 


kez” KEZ” 


where both series converge absolutely and uniformly on T”. In particular, taking 


y=, 
De = S> F(x) 


kEeZn KEZ?” 


Proof. The absolute and uniform convergence of the series follows from the fact 
that ’ peza (1 + |k|) 777 < 00, which can be seen by comparing the latter series to 
the convergent integral f (1 + |z|)~"~‘ dz. Thus the function Pf = 5°, Tk f is in 


C(T™) and hence in L? (T”), so Theorem 8.35 implies that the series Y f(«)e2"**"* 
converges in L?(T”) to Pf. Since it also converges uniformly, its sum equals P f 
pointwise. (The replacement of k by —k in the formula for Pf is immaterial since 
the sum is over all k € Z”.) E 


Exercises 
12. Work out the a of Theorem 8.22 for the Fourier transform on T”. 


13. Let f(x) = 5 — 2 on the interval 9, 1), and extend f to be periodic on R. 


a. F(0) = O. and Fik) = (27i)! if K £0. 
b. X077 k7? = 17/6. (Use the Parseval identity.) 


14. (Wirtinger’s Inequality) If f € C1([a,b]) and f(a) = f(b) = 0, then 


f te aide < (+ e) fire (2)? ae 
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(By a change of variable it suffices to assume a = 0, b = L, Extend f to [—5, 5] by 
setting f(—x) = — f (x), and then extend f to be periodic on R. Check that f, thus 
extended, is in C! (T) and apply the Parseval identity.) 


15. 


16. 


17. 


18. 


Let sinc x = (sin mz)/rz (sinc 0 = 1). 

a. Ifa > 0, X[-a,a)(2) = Xja,q)(L) = 2asinc Zaz. 

b. Let Ha = {f € L? : f(€) = 0 (a.e.) for |é| > a}. Then H is a Hilbert 
space and {\/2a sinc(2ax — k) : k € Z} is an orthonormal basis for H. 

c. (The Sampling Theorem) If f € Ha, then f € Co (after modification on a 
null set), and f(z) = °°. f(k/2a) sinc(2ax — k), where the series converges 
both uniformly and in L?. (In the terminology of signal analysis, a signal of 
bandwidth 2a is completely determined by sampling its values at a sequence of 
points {k/2a} whose spacing is the reciprocal of the bandwidth.) 


Let fk = X[-1,1] * X[-k,k]: 

a. Compute f(x) explicitly and show that || fllu = 2. 

b. f(x) = (nz)? sin 27kesin 272, and ||f¥||ı > œ as k > oo. (Use 
Exercise 15a, and substitute y = 27kz in the integral defining || ff ||1.) 

c. F(L*) is a proper subset of Co. (Consider gy = fy and use the open mapping 
theorem.) 


Given a > 0, let f(x) = e~?"* x2! for x > Oand f(x) = 0 for z < 0. 
a. feLi,andfe L? if a > 5. 


~ 


b. f(€) = Tr(a)[(2r)(1 + 2€)]~*. (Here we are using the branch of z® in the 
right half plane that is positive when z is positive. Cauchy’s theorem may be 
used to justify the complex substitution y = (1 + i)z in the integral defining 


f.) 


c. Ifa,b> 5 then 


a DONN oe 22-92-84 T(a+b—1) 
a b — CORNA 
Ja =g) (Lear) Tdr = Tari) 
Suppose f € L*(R). 


a. The L? derivative f’ (in the sense of Exercises 8 and 9) exists iff € fe L?, in 


which case f’(€) = 27i€ f (£). 


b. If the L? derivative f’ exists, then 


O <4 | lefe) ae | IPEP ae. 


(If the integrals on the right are finite, one can integrate by parts to obtain 
SIF? = -2Re f zf F>) 
c. (Heisenberg’s Inequality) For any b, 8 € R, 


Se 4 
[ce-oit@Pac fE- PRO a > ZB 
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(The inequality is trivial if either integral on the right is infinite; if not, reduce to 
the case b = 3 = 0 by considering g(x) = e~ 279" f(x + b).) This inequality, 
a form of the quantum uncertainty principle, says that f and f cannot both be 
sharply localized about single points b and £. 


19. (A variation on the theme of Exercise 18) If f € L?(R”) and the set S 
{x : f(x) # 0} has finite measure, then for any measurable E C R”, fp fl? 
| fllam(S)m(Z). 

20. If f € L'(R"*™), define Pf(r) = f f(z Y) ) dy. (Here z € R” and y € R™.) 
Then Pf € L'(R"), < ||flh. a (Pf) (€) = flé,0). 


21. State and prove a result that encompasses both Theorem 8.31 and Exercise 20, 
in the setting of Fourier transforms on closed subgroups and quotient groups of R”. 


IA Il 











22. Since F commutes with rotations, the Fourier transform of a radial function is 
radial; that is, if F € L! (R”) and F(x) = f(|x|), then F(E) = g(|€|), where f and 
g are related as follows. 
a. Let J(€) = fs e’*Sda(ax) where ø is surface measure on the unit sphere 
7 : IR” (Theorem ite Then J is radial — say, J(€) = j(|€|) — and 
= fj gemee) far" “dr. 
gine yO + = 0. 
c. j satisfies i" (p) + (n — 1)7’(p) + pj(p) = 0. (This equation is a variant 
of Bessel’s equation. The function 7 is completely determined by the fact that it 
is a solution of this equation, is smooth at p = 0, and satisfies 7(0) = a(S) = 
2n”/? /T(n/2). In fact, j(p) = (2r)? p7 J, _»)72(p) where Ja is the 
Bessel function of the first kind of order a.) 
d. If n = 3, j(p) = 4rp™t sin p. (Set f(p) = pj(p) and use (c) to show that 
f” + f = 0. Alternatively, use spherical coordinates to compute the integral 
defining J(0, 0, p) directly.) 


23. In this exercise we develop the theory of Hermite functions. 
a. Define operators T,T* on S(R) by T f(z) = 271/2[xf(x) — f’(x)] and 
T* f(x) = 2-"[xf(x) + fi(a)]. Then f(T f)g = f f(T*g) and T*T* — 
TPST 
b. Let ho(x) = 17!/4e-2°/2, and for k > 1 let hy = (k!)~1/2T* ho. (he is 
the kth normalized Hermite function.) We have Thk = /k +1 hz41 and 
T*h, = Vk hg-1, and hence TT*h;, = khy. 
c. Let S = 277* +I. Then Sf(x) = x? f(x) — f” (x) and Shy = (2k + 1)h x. 
(S is called the Hermite operator.) 
d. {h;,}$° is an orthonormal set in L?(R). (Check directly that ||ho||2 = 1, then 
observe that for k > 0, | hghm = k7! f(TT*hk)hm and use (a) and (b).) 
e. We have 


TE a 
TEJ) = (tHe (E) [eha 
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(use induction on k), and in particular, 


(—1)* x? /2 d © a 
hl) = Tappa’ | la) eo 


f. Let Hy (x) = er / 2hy(xv). Then Hy is a polynomial of degree k, called the 
kth normalized Hermite polynomial. The linear span of Ho,..., Hm is the 
set of all polynomials of degree < m. (The kth Hermite polynomial as usually 
defined is [11/?2*k!}1/? Hy.) 
g. {h,}8° is an orthonormal basis for L? (R). (Suppose f L hx for all k, and let 
g(x) = f(x)e~® /2. Show that g = 0 by expanding e~27#"* in its Maclaurin 
series and using (f).) 
- Define A : L? — L? by Af(z) = (2n)'/* f(z 2m), and define f = 
ASAT for f € L?. Then A is unitary and F(E) = (2r) 71/2 f f(r)e7*8? de. 
Moreover, Tf = —iT(f) for f € S, and ho = ho; hence hy = (—i)* hy. 
Therefore, if x = Ahk, {x }6° is an orthonormal basis for L? consisting of 
eigenfunctions for F; namely, Pk = (—i)* by. 


8.4 SUMMATION OF FOURIER INTEGRALS AND SERIES 


The Fourier inversion theorem shows how to express a function f on R” in terms 
of f provided that f and f are in Lt. The same result holds for periodic functions. 
Namely, if f € L!(T”) and f € 1!(Z”), then the Fourier series Y` f(«)e2™*"™ 
converges absolutely and uniformly to a function g. Since l! C 1?, it follows from 
Theorem 8.20 that f € L? and that the series converges to f in the L? norm. Hence 
f =g a.e., and f = g everywhere if f is assumed continuous at the outset. 

Two questions therefore arise. What conditions on f will guarantee that f IS 
integrable? And how can f be recovered from Ff if fi is not integrable? 

As for the first question, since fi is bounded for f € Lt, the issue is the decay 
of fat infinity, and this is related to the smoothness properties of f. For example, 
by Theorem 8.22e, if f € C”+1(R”) and 6% f € LIN Co for |a| < n + 1, then 
f(€)| < C(1 + |El) 7”! and hence f € L!(R”) by Corollary 2.52. The same 
result holds for periodic functions, for the same reason: If f € CHETA; then 
IF| < C + |s|) 7”! and hence f € l! (Z”). 

To obtain sharper results when n > 1 requires a generalized notion of partial 
derivatives, so we shall postpone this task until §9.3. (See Theorem 9.17.) However, 
forn = 1 we can easily obtain a better theorem that covers the useful case of functions 
that are continuous and piecewise C1. We state it for periodic functions and leave 
the nonperiodic case to the reader (Exercise 24). 


8.33 Theorem. Suppose that f is periodic and absolutely continuous on R, and that 
f’ € L*(T) for some p > 1. Then f € I1(Z). 
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Proof. Since p > 1, we have Cp = $7 K7? < 00; and since L?(T) C L?(T) 
for p> 2, we may assume that p < 2. Integration by parts (Theorem 3.36) shows that 
(F^) (K) = rikf (K). Hence, by the inequalities of Hölder and Hausdorff- Young, if 
q is the conjugate exponent to p, 


` HOJE [S (27s) | 1/p [Serf] 1/q 


KÆ0 kK#Æ0 KÆ0 
(2C,,)1/P a 
= OY Tes irl 
27 
Adding |#(0)| to both sides, we see that || fl] < o0. E 


We now turn to the problem of recovering f from f under minimal hypotheses 
on f, and we consider first the case of R”. The proof of the Fourier inversion 
theorem aes the essential idea: Replace the divergent integral f f(€)e2™§* dé 
by f f(é (tée? T dé where ® is a continuous function that vanishes rapidly 
enough x infinity to make the integral converge. If we choose Ẹ to satisfy ®(0) = 1, 
then ®(t€) — 1 ast — 0, and with any luck the corresponding integral will converge 
to f in some sense. One ® that works is the function ®(€) = e~léI” used in the proof 
of the inversion theorem, but we shall see below that there are others of independent 
interest. We therefore formulate a fairly general theorem, for which we need the 
following lemma that complements Theorem 8.22c. 


8.34 Lemma. /f f, g € L?(R”), then (fg)Y =f * g. 


Proof. fg € L! by Plancherel’s theorem and Hölder’s inequality, so (Fg) 
makes sense. Given z € R”, let h(y) = g(x — y). It is easily verified that h(€) = 


g(€)e—27"5'*, so since F is unitary on L?, 


2) = | fh= J f= / Feat de = (Fo) (2). 


8.35 Theorem. Suppose that ® € L! N Co, ®(0) = 1, and d = ®Y € L!. Given 
f EL +r, fort >0set 


V 


7 | F (EO (te) err dE. 


a. If f € LP (1 <p < oo), then ft € LP and | ft — f\lp > 0 as t +0. 

b. If f is bounded and uniformly continuous, then so is ft, and ft — f uniformly 
ast — 0. 

c. Suppose also that |¢(x)| < C(1+ |xz|)~"~< for some C,€ > 0. Then ft (x£) — 
f(x) for every z in the Lebesgue set of f. 
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_ Proof. We have f = fı + fə where fı € L* and f2 € L?. Since fi € L”, 
fo € Lo, and ® € (L! N Co) C (L! NA LP), the integral defining ft converges 
absolutely for every x. Moreover, if ¢,(x) = t” (ttx), we have ®(t£) = (¢; ) (ê) 
by the inversion theorem and Theorem 8.22b, and f (x) dx = (0) = 1. Since 


b, € L! wehave fı*¢ġ € L! and f,® € L!, so by Theorem 8.22c and the inversion 
formula, 


IEG P(tEJe? ET dE = fy * pils). 


Also, ¢ € L? by the Plancherel theorem, so by Lemma 8.34, 


f RO DEJE dé = fa x plê). 


In short, ft = f * ¢¢, so the assertions follow from Theorems 8.14 and 8.15. al 


By combining this theorem with the Poisson summation formula, we obtain a 
corresponding result for periodic functions. 


8.36 Theorem. Suppose that ® € C(R”) satisfies |®(€)| < C(1 + |€|)~"~S 
IBY (x)| < C(1 + |z|)~"~<, and &(0) = 1. Given f € L1(T”), fort > 0 set 


fi(2) = So fl«)®(te er" 


(which converges absolutely since ` |®(tK)| < oo). 
a. Iff € L?(T") (1 < p< œœ), then || f’— f ||, > 0ast — 0, and if f € C(T”), 
then ft — f uniformly as t — 0. 
b. ft(x)— f(x) for every x in the Lebesgue set of f. 


Proof. Leto = ®Y and g(x) = t-"¢(t~ 1x). Then (ht) (£) = (té), and ¢; 
satisfies the hypotheses of the Poisson summation formula, so 


` pelz — k) = >. P(tk)e eT, 
kez” kez” 


Let us denote the common value of these sums by yx (x£). Then 


(F x vu) (6) = Feyil) = fl) b(t) = (f°) (n), 
so ft = f * y». Hence, by Young’s inequality and Theorem 8.31 we have 


WF lp < Ilole < Illo = Illl, 


so the operators f +> f* are uniformly bounded on L’, 1 < p < œœ. 

Now, since ® is continuous and ®(0) = 1, we clearly have ft — f uniformly 
(and hence in L?(T”)) if f is a trigonometric polynomial — that is, if f(k) = 0 for 
all but finitely many «. But the trigonometric polynomials are dense in C(T”) in the 





260 ELEMENTS OF FOURIER ANALYSIS 


uniform norm by the Stone- Weierstrass theorem, and hence also dense in L?(T”) in 
the L? norm for p < oo. Assertion (a) therefore follows from Proposition 5.17. 

To prove (b), suppose that x is in the Lebesgue set of f; by translating f we may 
assume that x = 0, which simplifies the notation. With Q = [-5, 5)", we have 


Since 
Goa Ce ea) Cr ale 


for x E€ Q and k £0 we have |¢;(—z + k)| < C2"**t*|k|—"~€, and hence 


D | OAE + | de < (eee are and ae 


k0 k0 


which vanishes as t — 0. On the other hand, if we define g = fxg € L’(R”), then 
0 is in the Lebesgue set of g (because 0 is in the interior of Q, and the condition that 0 
be in the Lebesgue set of g depends only on the behavior of g near 0), so by Theorem 
8.15, 
lim | f(e)de(—2) de = lim g + bx(0) = 9(0) = (0) 
t—0 Q t—0 
E 


Let us examine some specific examples of functions ® that can be used in Theorems 
8.35 and 8.36. The first is the one already used in the proof of the inversion theorem, 


@(€) = eT mel” plr) =O") = ene, 


This ¢ is called the Gauss kernel or Weierstrass kernel. It is important for a number 
of reasons, including its connection with the heat equation that we shall explain in 
88.7. When n = 1, its periodized version 


W(x) = = Sete ere = 5 en Ten? erine 


kez KEZ 


in terms of which the ft in Theorem 8.36 is given by f* = f * p, is essentially one 
of the Jacobi theta functions, which are connected with elliptic functions and have 
applications in number theory. 

The second example is Ẹ(£) = e7?"lSl, whose inverse Fourier transform ¢ is 
called the Poisson kernel on R”. When n = 1, we have 


0 (oe) 
(x) =| eet (1+iz)g a+ | e2t(-1+ix)§ dé 
(8.37) T ý 








ee iy. ol 
On |l+izx 1—ir| a(14+-2?) 
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The formula for ¢ in higher dimensions is worked out in Exercise 26; it turns out 
that ¢(x) is a constant multiple of (1 + |z|?)~(“+1)/2. Like the Gauss kernel, the 
Poisson kernel has an interpretation in terms of partial differential equations that we 
shall explain in 88.7. 

If we take n = 1 and ®(€) = e~ 27/6! in Theorem 8.36, make the substitution 


r =e?7t and write A, f in place of ft, we obtain 
A, f(z) = 5 rll F(gje7 2na 
KEZ 
(8.38) 


= f(0) J ` rk ken at f(-—k)e-?"***) 
k=1 


This formula is a special case of one of the classical methods for summing a (possibly) 
divergent series. Namely, if )~>° ax is a series of complex numbers, for 0 < r < 1 its 
rth Abel mean is the series Zo r*a,. If the latter series converges for r < 1 to the 
sum S(r) and the limit S = lim, 71 S(r) exists, the series )*>° ax is said to be Abel 
summable to S. If a ay, converges to the sum S, then it is also Abel summable to 
S (Exercise 27), but the Abel sum may exist even when the series diverges. 

In (8.38), A, f(x) is the rth Abel mean of the Fourier series of f, in which the kth 
and (—k)th terms are grouped together to make a series indexed by the nonnegative 
integers. It has the following complex-variable interpretation: If we set z = re27**, 
then 


Ar f(x) = X Fik) +X f(-k)e*. 
0 1 


The two series on the right define, respectively, a holomorphic and an antiholomorphic 
function on the unit disc |z| < 1. In particular, A, f(z) is a harmonic function on the 
unit disc, and the fact that A, f — f as r — 1 means that f is the boundary value of 
this function on the unit circle. See also Exercise 28. 

Our final example is the function ®(€) = max(1 — |&|, 0) with n = 1. Its inverse 
Fourier transform 1s 


0 1 
o(x) = J (TE a8 2 | (rnea 


—1 
Š = Py z 2 
e?TiT Le 2mixr _ 9 sinter 
(Qrix)? 





TL 


If we use this ® in Theorem 8.36, take t = (m + 1)7! (m = 0,1,2,...), and write 
Om f(x) for f4/("+) (x), we obtain 


m = P 
om f(a) D BEER Aier 
(8.39) E 
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This is an instance of another classical method for summing divergent series. Namely, 
if Xo a is a series of complex numbers, its mth Cesaro mean is the average of its 
first m + 1 partial sums, (m + 1)~* D704’ Sn, where Sn = }`ọ ax. If the sequence 
of Cesaro means converges as m — œ to a limit S, the series is said to be Cesaro 
summable to S. It is easily verified that if }~>° a, converges to S, then it is Cesaro 
summable to S (but perhaps not conversely), and that om f(x) is the mth Cesaro 
mean of the Fourier series of f with the kth and (—k)th terms grouped together. See 
Exercise 29, and also Exercise 33 in the next section. 


Exercises 


24. State and prove an analogue of Theorem 8.33 for functions on R. (In addition 
to the hypotheses that f be locally absolutely continuous and that f’ € LP for some 
p > 1, you will need some further conditions f and/or f’ at infinity to make the 
argument work. Make them as mild as possible.) 


25. For0 < a < 1, let Ag(T) be the space of Hélder continuous functions on T of 
exponent & as in Exercise 11 in §5.1. Suppose 1 < p < œ and p™t + q7! = 1. 
a. If f satisfies the hypotheses of Theorem 8.33, o - € Ay /q(T ), but f need 
not lie in A,(T) for any a > 1/q. (Hint: f(b) =f f'(t) dt.) 
b. If œ < 1, Aa(T) contains functions that are K A bounded variation and 
hence are not absolutely continuous. (But cf. Exercise 37 in §3.5.) 


26. The aim of this exercise is to show that the inverse Fourier transform of e7?”lSl 


on R” is 
T(s(n+ 1)) 


a 9\ —(n+1)/2 
b(t) = — ry (1 + lel) : 


a. If 8 >0,e-% = ee (1 + t?)~te-** dt. (Use (8.37).) 
b. If 8 > 0, e7 = Jo ( 1g)—1/2e-se—8"/48 ds. (Use (a), Proposition 8.24, and 
the formula (1 + t)! = fy” e~(1+t")s ds.) 
c. Let 8 = 27|€| where £ € R”; then the formula in (b) expresses e~27!¢! as a 
superposition of dilated Gauss kernels. Use Proposition 8.24 again to derive the 
asserted formula for œ. 

27. Suppose that the numerical series )~5 ax is convergent. 
a. Let S% = S>" ap. Then Y% rřap = Y1 Si (ri — rit!) + S% r” for 
0 <r <1 (“summation by parts”). 
b. |Z}, rag] < SUP; sm ISh. 
c. The series Zo ray is uniformly convergent for 0 < r < 1, and hence its 
sum S(r) is continuous there. In particular, ea ak = lim, 71 S(r). 

28. Suppose that f € L! (T), and let A, f be given by (8.38). 
a. Arf = f x P, where P,(x) = J2 rl*le27*** is the Poisson kernel for T. 
b. P,(r) = (1 — r?) /(1 + r? — 2r cos 2772). 


29. Given {ak}8 C C, let Sn = Jo ak and om = (m +1) Ep Sn. 
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a Om = (m+1) to (m+ 1- kag. 

Delfin, 36330 5 aa akg exists, then so does liMm—oo Om, and the two 
limits are equal. 

c. The series )~>°(—1)* diverges but is Abel and Cesaro summable to 3. 


30. If f € L1(R”), f is continuous at 0, and f > 0, then f € L!. (Use Theorem 
8.35c and Fatou’s lemma.) 


31. Suppose a > 0. Use (8.37) to show that 
1 oni+ gmana 
ae k? +a? 0 al—e-27@ 


Then subtract a~? from both sides and let a — 0 to show that }-7° k~? = 1/6. 


32. A C” function f on R is real-analytic if for every x € R, f is the sum of its 
Taylor series based at x in some neighborhood of x. If f is periodic and we regard f 
as a function on S = {z € C: |z| = 1}, this condition is equivalent to the condition 
that f be the restriction to S of a holomorphic function on some neighborhood of 
S. Show that f e C°°(T) is real-analytic iff FCK) < Ce-‘!*l for some C, e€ > 0. 
(See the discussion of the Abel means A, f in the text, and note that Z = z—! when 
j2| = 1. 


8.5 POINTWISE CONVERGENCE OF FOURIER SERIES 


The techniques and results of the previous two sections, involving such things as 
LP norms and summability methods, are relatively modern; they were preceded 
historically by the study of pointwise convergence of one-dimensional Fourier series. 
Although the latter is one of the oldest parts of Fourier analysis, it is also one of 
the most difficult — unfortunately for the mathematicians who developed it, but 
fortunately for us who are the beneficiaries of the ideas and techniques they invented 
in doing so. A thorough study of this issue is beyond the scope of this book, but we 
would be remiss not to present a few of the classic results. 

To set the stage, suppose f € L'(T). We denote by Sm f the mth symmetric 
partial sum of the Fourier series of f: 


m~ 


From the definition of f(k), we have 


m pi 
Smi(a) =Y / f(y)" dy = f x Dmh), 
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where D,,, is the mth Dirichlet kernel: 
Dm (x) = > Tike 


The terms in this sum form a geometric progression, so 


e27 (2m+1)x — | 


2m 
— „—2rimr > } Qrikx _ _—2Qrimax 
Dalt) =e € =p e2tizr _ | 
0 


Multiplying top and bottom by e~7*” yields the standard closed formula for Dm: 


Gi gae ee 
er =e ie SIN mT 

The difficulty with the partial sums Sm f, as opposed to (for example) the Abel or 
Cesàro means, can be summed up in a nutshell as follows. Sm f can be regarded as a 
special case of the construction in Theorem 8.36; in fact, with the notation used there, 
Smf = f!/™ if we take 6 = X(-1,1]- But x(-1,1] does not satisfy the hypotheses of 
Theorem 8.36, because its inverse Fourier transform (mx)! sin 272 (Exercise 15a) 
is not in L! (R). On the level of periodic functions, this is reflected in the fact that 
although Dm € L! (T) for all m, |Dmllı — co as m — œœ (Exercise 34). 

Among the consequences of this is that the Fourier series of a continuous function 
f need not converge pointwise, much less uniformly, to f; see Exercise 35. (This 
does not contradict the fact that trigonometric polynomials are dense in C (T)! It 
just means that if one wants to approximate a function f € C(T) uniformly by 
trigonometric polynomials, one should not count on the partial sums Sm f to do the 
job; the Cesaro means defined by (8.39) work much better in general.) To obtain 
positive results for pointwise convergence, one must look in other directions. 

The first really general theorem about pointwise convergence of Fourier series 
was obtained in 1829 by Dirichlet, who showed that Sm f(x) > [f (x+) + f(<—)] 
for every x provided that f is piecewise continuous and piecewise monotone. Later 
refinements of the argument showed that what is really needed is for f to be of 
bounded variation. We now prove this theorem, for which we need two lemmas. The 
first one is a slight generalization of one of the more arcane theorems of elementary 
calculus, the “second mean value theorem for integrals.” 








8.41 Lemma. Let ¢ and w be real-valued functions on |a,b|. Suppose that ¢ is 
monotone and right continuous on [a,b] and w is continuous on |a, b}. Then there 
exists n € |a, b) such that 


9(z)p(x) dz = (a) / í w(x) dx + $(b) y(x) de. 


Proof. Adding a constant c to ¢ changes both sides of the equation by the 
amount c ie w(x) dx, so we may assume that (a) = 0. We may also assume that ¢ 
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is increasing; otherwise replace ¢ by —¢. Let U(x) = i w(t) dt (so that Y’ = —w) 
and apply Theorem 3.36: 


b 
J a EE J W(x) db(z). 
a (a,b] 


The endpoint evaluations vanish since ¢(a) = V(b) = 0. Since ¢ is increasing and 
f (a,b] do = o(b) — (a) = (0), if m and M are the minimum and maximum values 


of Y on [a,b] we have m¢(b) < Sa 4) Yad < Mo¢(). By the intermediate value 
theorem, then, there exists 7 € [a,b] such that le » ¥ do = W(7)(b), which is the 
desired result. E 


8.42 Lemma. There is a constant C < œ such that for every m > 0 and every 
[a,b] CG [-$, 5l. 
<C. 





Moreover, Tp D,, (2) de> 1/2 D(x) dx = 3 for all m. 


Proof. By (8.40), 


in(2m +1 á 1 1 
f Dri Jae = f EREDE an + f sin(2m + 1)ra|— -—| dz. 


NL SIN NT NL 





Since (sin mx)! — (mr)™} is bounded on [—4, 5] and | sin(2m + 1)rz| < 1, the 
second integral on the right is bounded in absolute value by a constant. With the 


substitution y = (2m + 1)rz, the first one becomes 





Ion sin y Jys Si[(2m + 1)rb] — Si[(2m + 1)ra] 
(2m+1)ra TY 7 
where Si(z at UF 1 sin ydy. But Si(x) is continuous and approaches the finite 


limits +4 =7 as © — Loo (see Exercise 59b in §2.6), so Si(x) is bounded. This proves 
the first Sro As for the second one, 


TE eTikz dr = 1 


1/2 


oon 
kaye 


(only the term with k = 0 is nonzero), so since D,, is even, 


0 
ae 
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8.43 Theorem. If f € BV(T) — that is, if f is periodic on R and of bounded 


variation on [—5, 5] — then 


lim S,, f(x) = $[f(c+) + f(2—-)] for every x. 
In particular, liMm—+oo Sm f(x) = f(x) at every x at which f is continuous. 


Proof. We begin by making some reductions. In examining the convergence of 
Sm f(x), we may assume that x = 0 (by replacing f with the translated function 
T_~f), that f is real-valued (by considering the real and imaginary parts separately), 
and that f is right continuous (since replacing f(t) by f(t+) affects nee o EN i 
nor [f (0+) + f(0—)}). In this case, by Theorem 3.27b, on the interval [—5, 5) we 
can write f as the difference of two right continuous increasing functions g and h. If 
these functions are extended to R by periodicity, they are again of bounded variation, 
and it is enough to show that Smg(0) — $[9(0+) + g(0—)] and likewise for h. 

In short, it suffices 9 consider the case where x = O and f is increasing and 
right continuous on [—ż4, 5). Since Dm is even, we have Sm f(0) = f x Dm(0) = 


e f(z ) D(X ) dx, SO by Lemma 8.42, 


Smf(0) — 3 [f(0+) + f(0-)] 
1/2 0 
: | F(x) — f(0+)] Din(x) de + J [f(2) — f(0-)] Dm (2) de. 
0 —1/2 


We shall show that the first integral on the right tends to zero as m — oo; a similar 
argument shows that the second integral also tends to zero, thereby completing the 
proof. 

Given e€ > 0, choose 6 > 0 small enough so that f(6) — f(0+) < €/C where C 
is as in Lemma 8.42. Then by Lemma 8.41, for some 77 € (0, ô], 


| | '[F@) = F(0+)] Dm (2) dz| = [f(8) - (0+)]| l Da 


which is less than e. On the other hand, by (8.40), 


1/2 
J U = £0+)] Dine) de = Ge (=m) - 3- (m), 
where g+ is the periodic function given on the interval [-4, 5) by 


aS iOpen 
27 sin Tx 


g(x) = 


But g+ € L1(T), so G4(4m) — 0 as m — of by the Riemann-Lebesgue lemma 
(the periodic analogue of Theorem 8.22f). Therefore, 


X[6,1/2) (2). 


ar 
Him sup| | [y f(0+)| Dm (x ) de <E 


mM — OO 


for every € > 0, and we are done. E 
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One of the less attractive features of Fourier series is that bad behavior of a function 
at one point affects the behavior of its Fourier series at all points. For example, if 
f has even one jump discontinuity, then T cannot be in l! (Z) and so the series 
D f(k)e2rtke cannot converge absolutely at any point. However, to a limited extent 
the convergence of the series at a point x depends only on the behavior of f near zx, 
as explained in the following localization theorem. 


8.44 Theorem. If f and g are in L1(T) and f = g onan open interval I, then 
Smf — Smg — 0 uniformly on compact subsets of I. 

Proof. It is enough to assume that g = 0 (consider f — g), and by translating f 
we may assume that J is centered at 0, say J = (—c,c) where c < L, Fix ó < c; we 
shall show that if f = 0 on J then Sm f — 0 uniformly on [—6, 6]. 

The first step is to show that Sm f — 0 pointwise on [—6, 6], and the argument is 
similar to the preceding proof. Namely, by (8.40) we have 


1/2 
A E / LE YP u) dy = 4m) = Be, (0) 


where f rer 
x—y)e 
9x (Y) = — Oisinzy ` 

Since f(x — y) = 0 on a neighborhood of the zeros of sin zy, the functions gx + are 
in L!(T), so Jz + (Fm) — 0 by the Riemann-Lebesgue lemma. 

The next step is to show that if z1, £2 € [—6,6], then Smf(£1) — Smf (x2) 
vanishes as zı — z2 — O, uniformly in m. By (8.40) again, 
1/2 


Smf (ti) = Smf (t2) = J see [f(z — y) — f (ze — y)| dy. 


—1/2 sin TY 


But f (xı — y) — f (z2 — y) = 0 for |y| < c — 6, and for c — 6 < |y| < į we have 








sin(2m + l)ry an 1 a7 
sin Ty ~ sin a(c — 6) 
where A is independent of m. Hence 
1/2 
|Smf (£1) — Smf(z2)| < af |f (z1 —y) — f (£2 — y)| dy = Altar f — Tz Flt, 
—1/2 


which vanishes as zı — £2 — O by (the periodic analogue of) Proposition 8.5. 

Now, given € > 0, we can choose 77 small enough so that if £1, £2 € [—6, 6] and 
|z1 — £2| < N, then |Sm f(z1) — Smf(x2)| < €/2. Choose 71,..., £k E [—6, 6] so 
that the intervals |£ — x;| < 7 cover [—6, 6]. Since Smf (xj) — 0 for each j, we can 
choose M large enough so that |S, f(z,;)| < €/2 form > M and1 <j < k. If 
|x| < 6, then, we have |x — z;| < 1 for some j, so 


[Smf (E)| < |Smf(2) — Smf (24) + [Smf (xz) < € 


for m > M, and we are done. B 
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8.45 Corollary. Suppose that f € L! (T) and I is an open interval of length < 1. 
a. If f agrees on I with a function g such that g € l (Z), then Smf —> f 
uniformly on compact subsets of I. 
b. If f is absolutely continuous on I and f' € L?(I) for some p > 1, then 
Smf — f uniformly on compact subsets of I. 


Proof. Iff = gon 1, then Smf — f = Smf — g = (Smf — Smg) + (Smg — 9) 
on I, and if g € l! (Z), then Smg — g uniformly on R; (a) follows. As for (b), given 
[ao, bo] C T, pick a < ao and b > bg so that [a,b] C J, and let g be the continuous 
periodic function that equals f on [a,b] and is linear on [b, a + 1] (which is unique 
since g(b) = f(b) and g(a + 1) = g(a) = f(a)). Under the hypotheses of (b), g is 
absolutely continuous on R and g’ € L?(T), so g € l! (Z) by Theorem 8.33. Thus 
Smf — f uniformly on [ao, bo] by (a). E 

Finally, we discuss the behavior of Sm f near a jump discontinuity of f. Let us 


first consider a simple example: Let 


(8.46) p(z) = 5-2 — [zx] ([z] = greatest integer < 7x). 


Then ¢ is periodic and is C™ except for jump discontinuities at the integers, where 
(j+) — (j-) = 1. It is easy to check that 6(0) = 0 and $(k) = (27ik)~! for 
k Æ 0 (Exercise 13a), so that 


Smo(z)= > 


0<|k|<m 


e2tikax m 


sin 27kx 
Qik “2 nrk ` 


From Corollary 8.45 it follows that Smp — ¢ uniformly on any compact set not 
containing an integer, and it is obvious that S,,¢(2) = 0 when z is an integer. But 
near the integers a peculiar thing happens: S,,@ contains a sequence of spikes that 
overshoot and undershoot ¢, as shown in Figure 8.1, and as m — ow the spikes tend 
to zero in width but not in height. In fact, when m is large the value of S,,¢ at its 
first maximum to the right of 0 is about 0.5895, about 18% greater than (0+) = L 
This is known as the Gibbs phenomenon; the precise statement and proof are given 
in Exercise 37. 

Now suppose that f is any periodic function on R having a jump discontinuity at 
x = a (that is, f(a+) and f(a—) exist and are unequal). Then the function 


g(x) = f(z) — [f(at+) — f(a—)] d(@ — a) 


is continuous at every point where f is, and also at x = a provided that we (re)define 
g(a) to be $[f(a+) + f(a—)], as the jumps in f and ¢ cancel out. If g satisfies one 
of the hypotheses of Corollary 8.45 on an interval J containing a, the Fourier series 
of g will converge uniformly near a, and hence the Fourier series of f will exhibit 
the same Gibbs phenomenon as that of ¢. 

Finally, suppose that f is periodic and continuous except at finitely many points 
a,,...,@% E T, where f has jump discontinuities. We can then subtract off all the 
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Fig. 8.1 The Gibbs phenomenon: the graph y = (rk)? sin 2r kz, —+ < fs 5. 


jumps to form a continuous function g: 


glz) = f(z) — )—[f(aj+) — f(aj—)] O(a — a5) 


If f satisfies some mild smoothness conditions — for example, if f is absolutely 
continuous on any interval not containing any a, and f’ € LP for some p > 1 — then 
g will be in /4(Z). Conclusion: Sm f — f uniformly on any interval not containing 
any aj, Sm(a;) + $[f(aj+) + f(aj;—)], and Sm f exhibits the Gibbs phenomenon 
near every aj. 


Exercises 


33. Let om f be the Cesaro means of the Fourier series of f given by (8.39). 
a. On f = f * Fm where Fm = (m+1)71 Xo Dp and D; is the kth Dirichlet 
kernel. (See Exercise 29a.) Fm is called the mth Fejér kernel. 
b. Fm(£) = sin?(m + 1)r2/(m + 1)sin? mz. (Use (8.40) and the fact that 
sin(2k + 1)ax = Im e+) tz ) 


34. If Dm is the mth Dirichlet kernel, ||Dml|ı — oo as m — oo. (Make the 
substitution y = (2m + 1)zz and use Exercise 59a in §2.6.) 








35. The purpose of this exercise is to show that the Fourier series of “most” contin- 
uous functions on T do not converge pointwise. 
a. Define ém(f) = Smf(0). Then ¢ € C(T)* and ||| = ||Dn|l1- 
b. The set of all f € C(T) such that the sequence {Sm f(0)} converges is 
meager in C(T). (Use Exercise 34 and the uniform boundedness principle.) 
c. There exist f € C(T) (in fact, a residual set of such f’s) such that {Sm f(x) } 
diverges for every x in a dense subset of T. (The result of (b) holds if the point 
0 is replaced by any other point in T. Apply Exercise 40 in 85.3.) 
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36. The Fourier transform is not surjective from L1(T) to Co(Z). (Use Exercise 34, 
and cf. Exercise 16c.) 


37. Let ¢ be given by (8.46) and let Am = Smd — Q. 
a. (d/drz)Am(z) = Dm(z) for x ¢ Z. 
b. The first maximum of Am to the right of 0 occurs at z = (2m + h and 


iii A --[ EE dt — 5 = 0.0895, 
m— Oo Im +1 =, 


(Use (8.40) and the fact that Am = | A! (t)dt -— 5 L) 
c. More generally, the jth a point of - to the right of O occurs at 
= glm 1): = 1ys424 2m), and 


j 1 2" sint 1 
oe eee a 
a (z4) a t 2 


These numbers are positive for 7 odd and negative for 7 even. (See Exercise 59b 
in §2.6.) 


8.6 FOURIER ANALYSIS OF MEASURES 


We recall that M(R”) is the space of complex Borel measures on R” (which are 
automatically Radon measures by Theorem 7.8), and we embed L! (R”) into M(R”) 
by identifying f € L! with the measure du = f dm. We shall need to define products 
of complex measures on Cartesian product spaces, which can easily be done in terms 
of products of positive measures by using Radon-Nikodym derivatives. Namely, if 
u,v E M(R”), we define u x v E M(R” x R”) by 


du x vay) = O gO (pl x Eo). 


If u,v € M(R”), we define their convolution p x v € M(R”) by px v(E) = 
u x v(a—*(E)) where a : R” x R” — R” is addition, a(z, y) = x + y. In other 
words, 


(8.47) ux v(E) = / Xe(z + y) du(zr) dv(y). 


8.48 Proposition. 
a. Convolution of measures is commutative and associative. 
b. For any bounded Borel measurable function h, 


[ratuer) = f] re + y) dua) avto) 
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c. luv] < llel iel] 
d. Ifd = f dm and dv = gdm, then d(u * v) = (f * g) dm; that is, on L! the 
new and old definitions of convolution coincide. 


Proof. Commutativity is obvious from Fubini’s theorem, as is associativity, for 
A * u * v is unambiguously defined by the formula 


àx p *v(E) = J|] xee+u+ z) dà(x) du(y) dv(z). 


Assertion (b) follows from (8.47) by the usual linearity and approximation arguments. 
In particular, taking h = d|p * v|/d(p * v), since |h| = 1 we obtain 


le * vl = J nimani / In| dlya| dlo-| = [lull In, 


which proves (c). Finally, if du = f dm and dv = g dm, for any bounded measurable 
h we have 


[rae = [[n(o+ wsleatu) deay 
= || hæ) $0 - vjolo) de dy = f h(E) + (2) ae, 


whence d(p * v) = (f * g) dm. E 


We can also define convolutions of measures with functions in LP (R”, m), which 
we implicitly assume to be Borel measurable. (By Proposition 2.12, this is no 
restriction.) 


8.49 eras If f € LPR”) (1 < p < oo) and p E M(R"), then the integral 


f* p(x) = f f(e—y) duly) exists forae. £, f xu € LP, and || f * yllo < |Ifllellll- 
(Here aa ” and “a.e.” refer to Lebesgue measure.) 


Proof. If f and p are nonnegative, then f * (x) exists (possibly being equal to 
co) for every x, and by Minkowsk1’s inequality for integrals, 


If *ullp < J LF —yllp duly) = [fllpllull. 


In particular, f x u(x) < oo for a.e. x. In the general case this argument applies to 
If | , and the result follows easily. E 





In the case p = 1, the definition of f * u in Proposition 8.49 coincides with the 
definition given earlier in which f is identified with f dm, for 


[ f * p(x) dz = J xE(z)f(x — y) duly) dz = J) xEl(z + y)f (x) dx du(y) 
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for any Borel set Æ. Thus L! (R”) is not merely a subalgebra of /(IR”) with respect 
to convolution but an ideal. 

We extend the Fourier transform from L1(IR”) to M(R”) in the obvious way: If 
u E€ M(R”), jis the function defined by 


Re) J e276 u(r), 


(The Fourier transform on measures is sometimes called the Fourier-Stieltjes trans- 
form.) Since e~276'* is uniformly continuous in z, it is clear that 7 is a bounded 
continuous function and that ||7||u < Ilall. Moreover, by taking h(x) = e278 in 
Proposition 8.48b, one sees immediately that (u x v) = LV. 

We conclude by giving a useful criterion for vague convergence of measures in 
terms of Fourier transforms. 


8.50 Proposition. Suppose that p1, p2, . .. and pare in M(R”). If||uz|| < C < co 
for all k and fik — {i pointwise, then pkg — p vaguely. 


Proof. If f € S,then fY € S (Corollary 8.23), so by the Fourier inversion 


theorem, 
| teu s PO e PY dy dup(2) = | PO) y) UR (y 


Since fY € L! and ||fk||u < C, the dominated convergence theorem implies that 
J fdux — f fdp. But S is dense in Co(R”) (Proposition 8.17), so by Proposition 
5.17, f f duk — f f du for all f E€ Co(R”), that is, px —> p vaguely. E 


This result has a partial converse: If u, — p vaguely and ||ux|| — |||], then 
lk — l pointwise. This follows from Exercise 26 in 87.3. 


Exercises 


38. Work out the analogues of the results in this section for measures on the torus 
D 


39. If p is a positive Borel measure on T with (T) = 1, then |ĝ(k)| < 1 for all 
k n unless p is a linear combination, with positive coefficients, of the point masses 
1 0 pets m=i for some m € N, in which case (jm) = 1 for all j € Z. 


40. LR") is vaguely dense in M(R”). (If u € M(R”), consider œt x u where 
{d:}450 is an approximate identity.) 


41. Let A be the set of finite linear combinations of the point masses 6,, x € R”. 
Then A is vaguely dense in M(R”). (If f is in the dense subset C.(R”) of L! (R”) 
and g E€ Co(R”), approximate f fg by Riemann sums. Then use Exercise 40.) 


42. A function ¢ on R” that satisfies ) 77 p- 2;2n0(£;—-2%) > Oforall z1,...,2m E 
Candallz,,...,2, E R”, foranym € N, is called positive definite. If p € M(R n) 
is positive, mein u is positive definite. 
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8.7 APPLICATIONS TO PARTIAL DIFFERENTIAL EQUATIONS 


In this section we present a few of the many applications of Fourier analysis to the 
theory of partial differential equations; others will be found in Chapter 9. We shall 
use the term differential operator to mean a linear partial differential operator with 
smooth coefficients, that is, an operator L of the form 


Lf(z)= Y ag(x)O%f(z), ag € C™. 


la| <m 


If the aq’s are constants, we call L a constant-coefficient operator. In this case, 
if for all sufficiently well-behaved functions f (for example, f € S) we have 


(Lf) (E) = XO aal2ri£)® F(E). 


la| <m 


It is therefore convenient to write L in a slightly different form: We set ba = 
(2ri)l®laa and introduce the operators 


D® = (2ni) ela, 


so that i: x 
DSS bee... “af DD Bae: 
lal<m la|<m 
Thus, if P is any polynomial in n complex variables, say P(€) = D Dae; 
we can form the constant-coefficient operator P(D) = > jaji ba D®, and we then 


have [P(D) f] = Pf. The polynomial P is called the symbol of the operator P( D). 

Clearly, one potential application of the Fourier transform is in finding solutions of 
the differential equation P(D)u = f. Indeed, application of the Fourier transform to 
both sides yields & = P7! f, whence u = (P71 f)“. Moreover, if P~ is the Fourier 
transform of a function ¢, we can express u directly in terms of f as u = f * d. For 
these calculations to make sense, however, the functions f and P-!f (or P-*) must 
be ones to which the Fourier transform can be applied, which is a serious limitation 
within the theory we have developed so far. The full power of this method becomes 
available only when the the domain of the Fourier transform is substantially extended. 
We shall do this in §9.2; for the time being, we invite the reader to work out a fairly 
simple example in Exercise 43. (It must also be pointed out that even when this 
method works, u = (P-L is far from being the only solution of P(D)u = f; 
there are others that grow too fast at infinity to be within the scope even of the 
extended Fourier transform.) 

Let us turn to some more concrete problems. The most important of all partial 
differential operators is the Laplacian 


n 


A= > aa} = —4n? X D? = P(D) where P(€) = —4n°|€|?. 


J 
1 
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The reason for this is that A is essentially the only (scalar) differential operator 
that is invariant under translations and rotations. (If one considers operators on 
vector-valued functions, there are others, such as the familiar grad, curl, and div of 
3-dimensional vector analysis.) More precisely, we have: 


8.51 Theorem. A differential operator L satisfies L(f oT) = (Lf) oT for all 
translations and rotations T iff there is a polynomial P in one variable such that 


L = P(A). 


Proof. Clearly L is translation-invariant iff L has constant coefficients, in which 
case L = Q(D) for some polynomial Q in n variables. Moreover, since (Lf) = Qf 
and the Fourier transform commutes with rotations, L commutes with rotations iff Q 
is rotation-invariant. Let Q = }`ọ Q; where Q; is homogeneous of degree j; then 
it is easy to see that Q is rotation-invariant iff each Q; is rotation-invariant. (Use 
induction on j and the fact that Q; (£) = lim„—o r75 Di Q;(r&).) But this means 
that Q; (£) depends only on ||, so Q; (£) = c;|€|? by homogeneity. Moreover, |£|7 is a 
polynomial precisely when 7 is even, soc; = 0 for 7 odd. Setting b, = (—417)—* cox, 
then, we have Q(€) = >> by(—4777|E|?)*, that is, L = > b,A*. E 


One of the basic boundary value problems for the Laplacian is the Dirichlet 
problem: Given an open set Q C R” and a function f on its boundary OQ, find 
a function u on 2 such that Au = 0 on Q and u|OQ = f. (This statement of the 
problem is deliberately a bit imprecise.) We shall solve the Dirichlet problem when 
Q is a half-space. 

For this purpose it will be convenient to replace n by n + 1 and to denote the 
coordinates on R"+! by z1,..., 2p, t¢. We continue to use the symbol A to denote 
the Laplacian on R”, and we set 





Or ai Ot’ 
so the Laplacian on R”+! is A + 02. We take the half-space to be R” x (0,00). 
Thus, given a function f on R”, satisfying conditions to be made more precise 
below, we wish to find a function u on R” x (0,00) such that (A + 62)u = 0 and 
ue 0N = ST: 

The idea is to apply the Fourier transform on R”, thus converting the partial 
differential equation (A + 0?)u = 0 into the simple ordinary differential equation 
(—47?|€|? + 02)@ = 0. The general solution of this equation is 


(8.52) OE, t) = cy (EJK +. cp(C)e?**EI, 


and we require that u(€,0) = fle ). We therefore obtain a solution to our problem 
by taking cı (£) = F(é ), co(€) = 0 (more about the reasons for this choice below); 
this gives @(£, t) = f(€)e~2*I4l, or u(x,t) = (f * P,)(x) where P, = (e727#él)v 
is the Poisson kernel introduced in 88.4. As we calculated in Exercise 26, 


Rie T($(n + 1)) t 
t =, a (n+1)/2 (t2 ape Ce 70t 
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So far this is all formal, since we have not specified conditions on f to ensure that 
these manipulations are justified. We now give a precise result. 


8.53 Theorem. Suppose f € L?(R”) (1 < p < œ). Then the function u(x,t) = 
(f * P,)(x) satisfies (A + 02)u = 0 on R” x (0, 00), and limy_,9 u(x,t) = f(x) for 
a.e. x and for every x at which f is continuous. Moreover, imio ||u(-,t) — fll = 0 
provided p < ©. 


Proof. P, and all of its derivatives are in L7(R”) for 1 < q < œ, since a rough 
calculation shows that |O° P,(x)| < C|x2|~"~!~!! and |3} P,(z)| < C;|2|~"~} for 
large x. Also, (A + 0?)P,(x) = 0, as can be verified by direct calculation or (more 
easily) by taking the Fourier transform. Hence f * P; is well defined and 


(A + 02)(f*P.) = f * (A+ 82) P; =0. 


Since P;(z) =t7-"P,(t~*z) and f P,(x) dx = P,(0) = 1, the remaining assertions 
follow from Theorems 8.14 and 8.15. E 


The function u(x,t) = (f * P;)(x) is not the only one satisfying the conclusions 
of Theorem 8.53; for example, v(x, t) = u(x, t) + ct also works, for any c € C. For 
f € L}, we could also obtain a large family of solutions by taking c2 in (8.52) to be an 
arbitrary function in C° and cı = T= Cy. (But there is no nice convolution formula 
for the resulting function u, because e2™*tlél is not the Fourier transform of a function 
or even a distribution.) The solution u(x,t) = (f x P;)(x) is distinguished, however, 
by its regularity at infinity; for example, it can be shown that if f € BC(R”), then u 
is the unique solution in BC(R” x [0, co)). 

The same idea can be used to solve the heat equation 


(ð; — Aju =0 


on R” x (0, co) subject to the initial condition u(z,0) = f(x). (Physical inter- 
pretation: u(x,t) represents the temperature at position x and time ¢ in a homo- 
geneous isotropic medium, given that the temperature at time 0 is f(x).) Indeed, 
Fourier transformation leads to the ordinary differential equation (0, +47? |€|?)a@ = 0 
with initial condition u(€,0) = f(€). The unique solution of the latter problem is 
ult = fle je 47" tle I’ In view of Proposition 8.24, this yields 


u(x,t) = f *G,(z), G(x) = (Ant)~?/2¢7 leh /4¢, 


Here we have G;(x) = t-”/2G,(t~1/2z), so after the change of variable s = vt, 
Theorems 8.14 and 8.15 apply again, and we obtain an exact analogue of Theorem 
8.53 for the initial value problem (ô, — A)u = 0, u(z,0) = f(x). Actually, in the 
present case the hypotheses on f can be relaxed considerably because G; € 5; see 
Exercise 44. 

Another fundamental equation of mathematical physics is the wave equation 


(0? — A)u=0. 
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(Physical interpretation: u(x,t) is the amplitude at position x and time t of a wave 
traveling in a homogeneous isotropic medium, with units chosen so that the speed of 
propagation is 1.) Here it is appropriate to specify both u(x, 0) and O,u(z, 0): 


(8.54) (02?-A)u=0,  u(z,0)= f(x), u(z,0) = g(z). 


After applying the Fourier transform, we obtain 


~ 


(3? + 4r?lE PaE, t) =0, 2E, 0) = F(E), AG(E,0) = F£), 


the solution to which is 


a ey 2 t ~ 
(8.55) (E, t) = (cos 2ml fe + eel ae), 
27 |El 
Since 8 [sin Qntlel 
sin 27 

COS 27t\€| = Ot ae ; 

it follows that 
u(z,t) = f x Wil) + g * Wilz), where W; = SER 


But here there is a problem: (27|£|)~1 sin 27t|€| is the Fourier transform of a function 
only when n < 2 and the Fourier transform of a measure only when n < 3; for these 
cases the resulting solution of the wave equation is worked out in Exercises 45-47. 
To carry out this analysis in higher dimensions requires the theory of distributions, 
which we shall examine in Chapter 9. (We shall not, however, derive the explicit 
formula for W+, which becomes increasingly complicated as n increases.) 


Exercises 


43. Let ¢(x) = e7!*!/2 on R. Use the Fourier transform to derive the solution 
u = f x ¢ of the differential equation u — u” = f, and then check directly that it 
works. What hypotheses are needed on f? 

44. Let G(x) = (4nt)—"/2e~I"!"/4t, and suppose that f € L},_(IR”) satisfies 
lf(xz)| < Cell? for every e > 0. Then u(x,t) = f * G(x) is well defined for all 
x € R” and t > 0; (O; — A)u = 0 on R” x (0, 00); and lim¿—o u(x, t) = f(x) for 
a.e. x and for every x at which f is continuous. (To show u(x,t) — f(x) a.e. ona 
bounded open set V, write f = df + (1 — ¢)f where ¢ € Ce and ¢ = 1 on V, and 
show that [(1 — ¢)f] * Gi — 0 on V.) 


45. Let n = 1. Use (8.55) and Exercise 15a to derive d’ Alembert’s solution to the 
initial value problem (8.54): 


a+t 
ast) S fet) +fe-H] +5 / g(s) ds. 


Ni ke 
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Under what conditions on f and g does this formula actually give a solution? 


46. Let n = 3, and let o; denote surface measure on the sphere |x| = t. Then 


sin 2nt|E| ae 
——— = (Art 

Q7|E| ( n ) Ot (£) 
(See Exercise 22d.) What is the resulting solution of the initial value problem (8.54), 
expressed in terms of convolutions? What conditions on f and g ensure its validity? 


47. Letn = 2. If £ € R2, let Ê = (€,0) € RÌ. Rewrite the result of Exercise 46, 


. 2 X Z= 
ae) Tts] = =J e me Cdog]; 
2rjé] ATE Jiej=t 


in terms of an integral over the disc D, = {y : |y| < t} in R? by projecting the 
upper and lower hemispheres of the sphere |x| = t in R3 onto the equatorial plane. 
Conclude that (27|£|)~+ sin 27t|€| is the Fourier transform of 


W(x) = (27) (6 — |z?) "xp, (2), 


and write out the resulting solution of the initial value problem (8.54). 


48. Solve the following initial value problems in terms of Fourier series, where f, 9, 
and u(-,t) are periodic functions on R: 
a. (3? + 0?)u = 0, u(x, 0) = f(x). (Cf. the discussion of Abel means in §8.4.) 
b. (3; — 62)u = 0, u(x, 0) = f(z). 
c. (8? — 82)u = 0, u(x, 0) = f(z), O,u(x, 0) = g(x). 


49. In this exercise we discuss heat flow on an interval. 

a. Solve (0; — 02)u = 0 on (a, b) x (0, 00) with boundary conditions u(x, 0) = 
f(z) for x € (a,b), u(a,t) = u(b,t) = 0 for t > 0, in terms of Fourier series. 
(This describes heat flow on (a,b) when the endpoints are held at a constant 
temperature. It suffices to assume a = 0, b = $3 extend f to R by requiring f to 
be odd and periodic, and use Exercise 48b.) 

b. Solve the same problem with the condition u(a,t) = u(b,t) = 0 replaced 
by O,u(a,t) = O,u(b,t) = 0. (This describes heat flow on (a,b) when the 
endpoints are insulated. This time, extend f to be even and periodic.) 


50. Solve (02 — 02)u = 0 on (a,b) x (0,00) with boundary conditions u(z,0) = 
f(x) and O,u(x,0) = g(x) for x € (a,b), u(a,t) = u(b, t) = 0 for t > 0, in terms 
of Fourier series by the method of Exercise 49a. (This problem describes the motion 
of a vibrating string that is fixed at the endpoints. It can also be solved by extending 
f to be odd and periodic and using Exercise 45. That form of the solution tells you 
what you see when you look at a vibrating string; this one tells you what you hear 
when you listen to it.) 
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8.8 NOTES AND REFERENCES 


The scope of Fourier analysis is much wider than we have been able to indicate in 
this chapter. Dym and McKean [36] gives a more comprehensive treatment with 
many interesting applications. Also recommended are Körner’s delightful book [87], 
which discusses various aspects of classical Fourier analysis and their role in science, 
and the excellent collection of expository articles edited by Ash [7], which gives a 
broader view of the mathematical ramifications of the subject. On the more advanced 
level, the reader should consult Zygmund [167] for the classical theory and Stein 
[140], [141] and Stein and Weiss [142] for some of the more recent developments. 


§8.1: The formulas given in most calculus books for the remainder term R(x) = 
f(z) — P(x) in Taylor’s formula (where P; is the Taylor polynomial of degree k) 
require f to possess derivatives of order k + 1, but this is not really necessary. The 
version of Taylor’s theorem stated in the text is derived in Folland [45]. 


88.3: Trigonometric series and integrals have a very long history, but modern 
Fourier analysis only became possible after the invention of the Lebesgue integral. 
When that tool became available, the L? theory was quickly established: the Riesz- 
Fischer theorem [44], [114] for Fourier series (essentially Theorem 8.20), and the 
Plancherel theorem [110] for Fourier integrals. Since then the subject has developed 
in many directions. 

There is no universal agreement on where to put the factors of 27 in the definition 
of the Fourier transform. Other common conventions are 


PAO = fe®*s(e)ds,  Faf(E) = (ny? fe a, 
whose inverse transforms are 
~19(2) = (2m)-" / eS g(e) de, Fy 1g(€) = (2r)? J of g (£) dé. 


F, has the disadvantage of not being unitary (||F1 fllo = (27)”/?||fll2), whereas 
Fə is unitary but does not convert convolution into multiplication (Fo(f * g) = 
(27)"/* (Fa f)(Fog)). To make both L? norms and convolutions come out right, 
one can either put the 27’s in the exponent, as we have done, or omit them from 
the exponent but replace Lebesgue measure dx by (2r) 2 dx in defining both the 
Fourier transform (as in Fə) and convolutions. 

The Hausdorff-Young inequality ||fllg < Ifl € < p < 2, p7! +97} = 1) 
is sharp on T”, since equality holds when f is a constant function; but on R” the 
optimal result, a deep theorem of Beckner [14], is that ii < pr/*Pg—”/24|| flp. 

One of the fundamental qualitative features of the Fourier transform is the fact 
that, roughly speaking, a nonzero function and its Fourier transform cannot both be 
sharply localized, that is, they cannot both be negligibly small outside of small sets. 
This general principle has a number of different precise formulations, two of which 
are derived in Exercises 18 and 19; see Folland and Sitaram [50] for a comprehensive 
discussion. 
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A nice complex-variable proof of the fact that the Fourier transform is injective 
on L! can be found in Newman [106]. 


§88.4-5: The theory of convergence of one-dimensional Fourier series really 
began (as mentioned in the text) with Dirichlet’s theorem in 1829. The first construc- 
tion of a continuous function whose Fourier series does not converge pointwise was 
obtained by du Bois Reymond in 1876, and the fact that the Fourier series of a con- 
tinuous function f is uniformly Cesaro summable to f was proved by Fejér in 1904. 
In 1926 Kolmogorov produced an f € L! (T) such that {Sm f(x)} diverges at every 
x; on the other hand, in 1927 M. Riesz proved that for 1 < p < œ, ||Sinf —fllp — 0 
for every f € L?(T). The culmination of this subject is the theorem of Carleson 
(1966, for p = 2) and Hunt (1967, for general p) that if f € L?(T) where p > 1, 
then Smf — f almost everywhere. 

For more information, see Zygmund [167], the articles by Zygmund and Hunt 
in Ash [7], and Fefferman [42]. Also see Hewitt and Hewitt [74] for an interesting 
historical discussion of the Gibbs phenomenon. 

Convergence of Fourier series in n variables is an even trickier subject. In the first 
place, one must decide what one means by a partial sum of a series indexed by Z”. 
It is a straightforward consequence of the Riesz and Carleson-Hunt theorems that if 
f € L”(T”) with p > 1, the “cubical partial sums” 


Si f(e)= So flee"? (Ikl = max(|i,-.-, |Knl)) 


lel Sm 


converge to f a.e. and (if p < oo) in the LP norm. On the other hand, C. Fefferman 
proved the rather shocking result that for the “spherical partial sums” 


S.f(2) = D> Flee = (In)? = S73), 
1 


ln|<r 


the convergence lim,._.00 |S- f — f||p = 0 holds for all f € LP only when p = 2, 
if n > 1. Of course, one can consider modifications of the spherical partial sums in 
the hope of obtaining positive results; the most intensively studied of these are the 
Bochner-Riesz means 


of f(z) = S (1— |r?) e 
|e|<r 
obtained by taking ®(£) = [max(1 — |g|?), 0]~ in Theorem 8.36. (When n = 1, 
a. 41J is essentially equivalent to the Cesaro mean om f.) These ®’s satisfy the 
hypotheses of Theorem 8.36 when a > i(n — 1), and some positive results are also 
known for smaller values of a. 


Davis and Chang [30] is a good source for all of this material; see also Stein and 
Weiss [142] and Ash [7]. 


88.7: The solution of the initial value problem for the wave equation in arbitrary 
dimensions can be found in Folland [48]; see also Folland [49]. Further applications 
of Fourier analysis to differential equations can be found in Folland [46], [48], Körner 
[87], and Taylor [147]. 





Elements of Distribution 
Theory 


At least as far back as Heaviside in the 1890s, engineers and physicists have found 
it convenient to consider mathematical objects which, roughly speaking, resemble 
functions but are more singular than functions. Despite their evident efficacy, such 
objects were at first received with disdain and perplexity by the pure mathematicians, 
and one of the most important conceptual advances in modern analysis is the devel- 
opment of methods for dealing with them in a rigorous and systematic way. The 
method that has proved to be most generally useful is Laurent Schwartz’s theory of 
distributions, based on the idea of linear functionals on test functions. For some 
purposes, however, it is preferable to use a theory more closely tied to L? on which 
the power of Hilbert space methods and the Plancherel theorem can be brought to 
bear, namely, the (L?) Sobolev spaces. In this chapter we present the fundamentals 
of these theories and some of their applications. 


9.1 DISTRIBUTIONS 


In order to find a fruitful generalization of the notion of function on R”, it is necessary 
to get away from the classical definition of function as a map that assigns to each 
point of R” a numerical value. We have already done this to some extent in the 
theory of LP spaces: If f € LP, the pointwise values f(x) are of little significance 
for the behavior of f as an element of LP, as f can be modified on any set of measure 
zero without affecting the latter. What is more to the point is the family of integrals 
f fọ as ọ ranges over the dual space L9. Indeed, we know that f is completely 
determined by its action as a linear functional on L4; on the other hand, if we take 
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$ = ¢, = m(B,)~!wve,. where B, is the ball of radius r about z, by the Lebesgue 
differentiation theorem we can recover the pointwise value f(z), for almost every 
z, as lim,o f fr. Thus, we lose nothing by thinking of f as a linear map from 
L4(IR”) to C rather than as a map from R” to C. 

Let us modify this idea by allowing f to be merely locally integrable on R” but 
requiring ¢ to lie in CS°. Again the map ¢ +> f fọ is a well-defined linear functional 
on C'S, and again the pointwise values of f can be recovered a.e. from it, by an easy 
extension of Theorem 8.15. But there are many linear functionals on C?F that are 
not of the form ¢ +> f fd, and these — subject to a mild continuity condition to be 
specified below — will be our “generalized functions.” 

Recall that for E C R” we have defined C'S° (E) to be the set of all C% functions 
whose support is compact and contained in &. If U C R” is open, C (U) is the 
union of the spaces C° (K) as K ranges over all compact subsets of U. Each of the 
latter is a Fréchet space with the topology defined by the norms 


$ => Olu (a€ {0, 1, 2,...}”), 


in which a sequence {¢;} converges to ¢ iff 0°¢; — O%¢ uniformly for all a. (The 
completeness of CS°(K) is easily proved by the argument in Exercise 9 in §5.1.) 
With this in mind, we make the following definitions, in which U is an open subset 
of R”: 


i. A sequence {¢;} in C (U) converges in CX to ¢ if {6;} C Co°(K) for 
some compact set K C U and ¢; — ¢ in the topology of C'S°(K), that is, 
O°; — O~¢ uniformly for all a. 


ii. If X is a locally convex topological vector space and T : Co°(U) — X is 
a linear map, T is continuous if T'|CS°() is continuous for each compact 
K CU, that is, if Tọ; — To whenever ¢; — ¢ in CS°(K) and K C U is 
compact. 


iii. A linear map T : CS°(U) — CS°(U’) is continuous if for each compact 
K CU there is a compact K’ C U” such that T(C'S°(K)) c CS°(K’"), and T 
is continuous from CV (K) to CV(K'). 


iv. A distribution on U is a continuous linear functional on C° (U). The space 
of all distributions on U is denoted by D’(U), and we set D' = D’(R”). 
We impose the weak* topology on D’(U), that is, the topology of pointwise 
convergence on C9 (U). 


Two remarks: First, the standard notation D’ for the space of distributions comes 
from Schwartz’s notation D for C°, which is also quite common. Second, there is 
a locally convex topology on CSF with respect to which sequential convergence in 
CX is given by (i) and continuity of linear maps T : CX — X and T : CV — C'S 
is given by (ii) and (iii). However, its definition is rather complicated and of little 
importance for the elementary theory of distributions, so we shall omit it. 

Here are some examples of distributions; more will be presented below. 
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e Every f € Lj,.(U) — that is, every function f on U such that fẹ |f| < co for 
every compact K C U — defines a distribution on U, namely, the functional 
p — | fo, and two functions define the same distribution precisely when they 


are equal a.e. 
e Every Radon measure p on U defines a distribution by d > f ¢ dp. 


e If zo E U and a is a multi-index, the map ¢ +> 0%¢(zo) is a distribution 
that does not arise from a function; it arises from a measure p precisely when 
a = 0, in which case p is the point mass at Zo. 


If f € L} e(U), we denote the distribution ¢ > f fdalso by f, thereby identifying 
LÌ (U) with a subspace of D’(U). In order to avoid notational confusion between 
f(x) and f(¢) = f fo, we adopt a different notation for the pairing between C3 (U) 
and D'(U). Namely, if F € D’(U) and ¢ € CX (U), the value of F at ¢ will be 
denoted by (F, ø}. Observe that the pairing (-,-) between D’(U) and C3 (U) is 
linear in each variable; this conflicts with our earlier notation for inner products but 
will cause no serious confusion. If p is a measure, we shall also identify jz with the 
distribution ¢> f dp 

Sometimes it is convenient to pretend that a distribution F is a function even 
when it really is not, and to write f F(x)¢(x) dz instead of (F, ¢). This is the case 
especially when the explicit presence of the variable x is notationally helpful. 

At this point we set forth two pieces of notation that will be used consistently 
throughout this chapter. First, we shall use a tilde to denote the reflection of a 
function in the origin: 


p(z) = $(-2). 
Second, we denote the point mass at the origin, which plays a central role in distri- 
bution theory, by 6: 

(6,6) = $(0). 


As an illustration of the role of 6 and the notion of convergence in D’, we record 
the following important corollary of Theorem 8.14: 


9.1 Proposition. Suppose that f € L'(IR") and J f= a, and fort > Olet f(z) = 
t—-"f (ttx). Then fi 3 ad in D' ast — 0. 


Proof. If 6 € CX, by Theorem 8.14 we have 


(fs, ¢) = / feb = fe « B(0) — aG(0) = a9(0) = af6, 9). 


Although it does not make sense to say that two distributions F and G in D’(U) 
agree at a single point, it does make sense to say that they agree on an open set 
V CU; namely, F = G on V iff (F, ¢) = (G, ¢) for all 6 € CS°(V). (Clearly, if F 
and G are continuous functions, this condition is equivalent to the pointwise equality 
of F and G on V; if F and G are merely locally integrable, it means that F = G a.e. 
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on V.) Since a function in C3° (Vi U V2) need not be supported in either V; or Va, it 
is not immediately obvious that if F = G on V; and on V then F = Gon Vh UVa. 
However, it is true: 


9.2 Proposition. Let {Va} be a collection of open subsets of U and let V = |] Va. 
IfF,G € D'(U) and F = G on each Va, then F = G on V. 


Proof. If ¢ € CZ (V), there exist a1,...a@m such that supp ¢ C Uy" Va,. Pick 
Wi,- Ym E CZ such that supp(W;) C Va, and X7 Y; = 1 on supp(¢). (That 
this can be done is the C’ analogue of Proposition 4.41, proved in the same way 
as that result by using the C% Urysohn lemma.) Then (F,¢) = } (F, yj) = 


dG, Yj) = (G, $). R 


According to Proposition 9.2, if F € D' (U), there is a maximal open subset of 
U on which F = 0, namely the union of all the open subsets on which F = 0. Its 
complement in U is called the support of F. 

There is a general procedure for extending various linear operations from functions 
to distributions. Suppose that U and V are open sets in R”, and T is a linear map 
from some subspace X of Li (U) into L} .(V). Suppose that there is another linear 
map T” : CS°(V) — CX (U) such that 


[ane= | iTo (f EX, 6ECH(V)). 


Suppose also that T’ is continuous in the sense defined above. Then T can be 
extended to a map from ‘D’(U) to D’(V), still denoted by T, by 


(TF,¢)=(F,T’¢) (FED (WU), gE CH(V)). 


The intervention of the continuous map T” guarantees that the original T, as well as 
its extension to distributions, is continuous with respect to the weak* topology on 
distributions: If Fa — F € D’(U), then TF, > TF in D'(V). 

Here are the most important instances of this procedure. In each of them, U is 
an open set in R”, and the continuity of T” is an easy exercise that we leave to the 
reader. 


i. (Differentiation) Let Tf = 0° f, defined on Cl?!(U). If ¢ € CX (U), inte- 
gration by parts gives [(O°%f)¢ = (—1)!¢! f f(O0%q); there are no boundary 
terms since œ has compact support. Hence T” = (—1)!*!T|CS°(U), and we 
can define the derivative 6% F € D’(U) of any F € D' (U) by 


(O° F, p) m (—1)!*l(F, o“). 


Notice, in particular, that by this procedure we can define derivatives of arbitrary 
locally integrable functions even when they are not differentiable in the classical 
sense; this is one of the main reasons for the power of distribution theory. We 
shall discuss this matter in more detail below. 








ll. 


lil. 
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(Multiplication by Smooth Functions) Given y € C™(U), define Tf = wf. 
Then T” = T|C'S°(U), so we can define the product YF € D’(U) for F € 
D'(U) by 

(WF, p) = (F, Yo). 


Moreover, if € CSC (U), this formula makes sense for any ¢ € C3 (R”) and 
defines yF as a distribution on R”. 


(Translation) Given y € R”, let V = U +y = {x +y: x € U} and 
let T = ty. (Recall that we have defined 7, f(x) = f(x — y).) Since 
J f(x — y)¢(x) dx = f f(x)d(x + y) dz, we have T’ = 7_,|CS(U + y). 
For F € D’(U), then, we define the translated distribution 7, F' € D’(U + y) 
by 

(TyF, b) = (F, Ty). 


For example, the point mass at y is Ty6. 


iv. (Composition with Linear Maps) Given an invertible linear transformation S 


of R”, let V = S~1(U) and let Tf = f o S. Then T’¢ = | det S|-'ho 957! 
by Theorem 2.44, so for F € D’(U) we define F o S € D'(S—1(U)) by 


(FoS,¢) = | det S| THF, do S71). 


In particular, for Sx = —z we have fo S = f, S-t = S, and | det S| = 1, so 
we define the reflection of a distribution in the origin by 


(F, ¢) = (Fd). 


. (Convolution, First Method) Given Y% € Ce, let 


V = {x :x — y €U for y € supp(y)}.- 


(V is open but may be empty.) If f € Li,.(U), the integral 


f*v(2) = / f(x — yy) dy = J fule -y)dy = / f(t20) 


is well defined for all £x € V. The same definition works for F € D’(U): the 
convolution F x ~ is the function defined on V by 


~ 


Faai = (Fy try). 


Since Tw — Tao in CP as z — zo, F» is a continuous function (actually 
C°, as we shall soon see) on V. As an example, for any y € C'S° we have 


8x U(x) = (6,720) = T2h(0) = Y(2), 


so ó is the multiplicative identity for convolution. 
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vi. (Convolution, Second Method) Let 4, y, and V be as in (v). If f € Et) 
and @ € C’o°(V), we have 


[ (e+e = [fre J(u) duda = | Fed), 


That is, if Tf = f * a, then T maps Lj,.(U) into Li,.(V) and T’¢ = o * Y. 
For F € D'(U), we can therefore define F * w as a distribution on V by 


(F * p, p) = (F, o * 9). 
Again, we have 6 * w = y, for 


(xv, ¢) = (,¢*0) = 0) = | (ew x) de = (4, $). 


The definitions of convolution in (v) and (vi) are actually equivalent, as we shall 
now show. 


9.3 Proposition. Suppose that U is open in R” and y E CÙ. Let V = {x :x—y E 
U fory € supp(w~)}. For F € D'(U) and z E V let F x y(x) = (F, Tb). Then 
a F*y Ee C”(V). 
b. O°(F xy) = (O°F) xy = F * (8%). 
c. Foranyọ E CY (V), [(F »*y)o = (F, ¢ * Y). 


Proof. Let e,,...,e, be the standard basis for R”. If x € V, there exists tọ > 0 
such that x + te; E U for |t| < to, and it is easily verified that 


~ 


t Tet — Tay) — T20;%) in C% (U) ast — 0. 


It follows that 0;(F' * w)(z) exists and equals F * 0;~(2), so by induction, F * w € 
C™(V) and 0°(F x =) = F * 0%. Moreover, since 0% = (—1)!¢!0eq and 
O"T, = T,0%, we have 


(O° F) xy(£) = (O°F, te) = (-1)!(F, 8 Tsp) = (F,7,0°) = F* (8) (1). 
Next, if o € CV (V), we have 


p(x) = [smo — x) dy = J seria) ay 


The integrand here is continuous and supported in a compact subset of U, so 
the integral can be approximated by Riemann sums. That is, for each (large) 
m € N we can approximate supp(¢) by a union of cubes of side length 27™ 
(and volume 27”™) centered at points y{”,... , Ve(m) € supp(¢); then the corre- 


sponding Riemann sums S™ = 27”™ Y` j (Yr) Tym) are supported in a common 
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compact subset of U and converge uniformly to @ * Y as m — œœ. Likewise, 
OSS) em ar ACH )Tym A%y converges uniformly to ¢ * Oy = OF (Q x w), 
so S™ — bx p in C&°(U). Hence, 

(F,é*) = lim (F, S”) = lime DOr S oluy (F, Tym) 


m—0o 
J 


J o(y) (E, Ty) dy = / oly) F * p(y) dy 


Next we show that although distributions may be highly singular objects, they can 
all be approximated in the (weak*) topology of distributions by smooth functions, 
even by compactly supported ones. 


9.4 Lemma. Suppose that ¢ E CS, Y E CP, and fy = 1, and let y(x) = 
t-"(t-*2). 
a. Given any neighborhood U of supp(¢), we have supp(¢ * yt) C U fort 
sufficiently small. 


b. ọ* pi —> ọġ in CY ast > 0. 


Proof. Ifsupp(wW) C {z : |x| < R} then supp(¢ * pt) is contained in the set of 
points whose distance from supp(¢) is at most tR; this is included in a fixed compact 
set if t < 1 and is included in U if t is small. Moreover, 0°(¢ * pt) = (8% p) * Yı —> 
“ġ uniformly as t — 0, by Theorem 8.14. The result follows. E 


9.5 Proposition. For any open U C R”, CX (U) is dense in D'(U) in the topology 
of D'(U). 


Proof. Suppose F € D'(U). We shall first approximate F by distributions 
supported in compact subsets of U, then approximate the latter by functions in 
Co (U): 

Let {V; } be an increasing sequence of precompact open subsets of U whose union 
is U, as in Proposition 4.39. For each 7, by the C™ Urysohn lemma we can pick 
Çj E€ C&°(U) such that ¢; = 1 on V}. Given ¢ € CS°(U), for j sufficiently large we 
have supp(¢) C V; and hence (F, ¢) = (F, Cjo) = (GF, $). Therefore ¢;F > F 
as 7 — Oo. 

Now, as we noted in defining products of smooth functions and distributions, 
since supp(C;) is compact, ¢;F can be regarded as a distribution on R”. Let y, p 
be as in Lemma 9.4, and p(x) = y(-z). Then fu = 1 also, so given ọ € C'S, 
we have @ * We — gin CS by Lemma 9.4. But then by Proposition 9.3, we 
have (jF) * yı € C™ and (GF) * Ye, 6) = (GF, $ * Ut) > (GF, Ø), so 
(GEF) * ye — CF in D’. In short, every neighborhood of F in D’(U) contains the 
C% functions (Çj F) * Y for j large and t small. 
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Finally, we observe that supp(¢;) C Vp for some k. If supp(¢) N Vk = Ø, then 
for sufficiently small ¢ we have supp(* y) NV, = Ø (Lemma 9.4 again) and hence 


(GEF) * p, 6) = (F, C loxp)) = 0. In other words, supp( (Q; F)* p+) C Vk C U, 
so we are done. E 


We conclude this section with some further remarks and examples concerning 
differentiation of distributions. To restate the basic facts: Every F € D' (U) possesses 
derivatives O°F € D'(U) of all orders; moreover, O™ is a continuous linear map 
of D'(U) into itself. Let us examine a couple of one-dimensional examples to see 
what sort of things arise by taking distribution derivatives of functions that are not 
classically differentiable. 

First, differentiating functions with jump discontinuities leads to “delta-functions,” 
that is, distributions given by measures that are point masses. The simplest example 
is the Heaviside step function H = x(0,.0), for which we have 


i eR rE | "Ob =O SG), 


so H’ = 6. See Exercises 5 and 7 for generalizations. 

Second, distribution derivatives can be used to extract “finite parts” from divergent 
integrals. For example, let f(x) = 2~'x(0,00)(z). f is locally integrable on R \ {0} 
and so defines a distribution there, but f fọ diverges whenever ¢(0) 4 0. Nonethe- 
less, there is a distribution on R that agrees with f on R \ {0}, namely, the distribution 
derivative of the locally integrable function L(x) = (log z)x(0,%)(x). One way of 
seeing what is going on here is to consider the functions Le(£) = (log £)xX(e,00) (£). 
By the dominated convergence theorem we have f Lọ = limo f Le for any 
o € CX, that is, Le > Lin D’; it follows that Li — L’ in D’. But 


(Li, 6) = -(Le 0’) = — / * 6'(1t) loge de = J AD de + (6) loge. 


As € — QO, this last sum converges even though the two terms individually do not. 
Formally, passage to the limit gives (L’,¢) = f fọ + (log 0)¢(0); that is, L’ is 
obtained from f by subtracting an infinite multiple of 6. (This process is akin to the 
“renormalizations” used by physicists to remove the divergences from quantum field 
theory.) 

Another way to analyze this situation is to consider smooth approximations to L, 
such as L*(x) = L(x)w(ex) where Y is a smooth function such that y(x) = 0 for 
x < 1 and w(x) = 1 for x > 2. The reader is invited to sketch the graphs of L€ and 
(LV; the latter will look like the graph of f together with a large negative spike near 
the origin, which turns into “—oo - 6” as € — 0. See also Exercises 10 and 12. 

Finally, we remark that one of the bugbears of advanced calculus, that equality 
of mixed partials need not hold for functions whose derivatives are not continuous, 
disappears in the setting of distributions: 0;0, = 0,0; on CX; therefore 0;0, = 
0.0; on D’! In the standard counterexample, f(z, y) = cy (x? — y*)\(x? + y2)! 
(with f(0,0) = 0), 0,0, f and 0,0,f are locally integrable functions that agree 
everywhere except at the origin; hence they are identical as distributions. 
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Exercises 


1. Suppose that fı, fo,..., and f are in Lj,.(U). The conditions in (a) and (b) 
below imply that fn — f in ‘D’(U), but the condition in (c) does not. 

a. fn E L?(U) (1 < p< œ)and fa — f in the LP norm or weakly in D’. 

b. For all n, | fn| < g for some g € L} ,(U), and fn > f a.e. 

€. fn — f pointwise. 


2. The product rule for derivatives is valid for products of smooth functions and 
distributions. 


3. On R, ify € C@ then yô*) = DA Ey (0), where the super- 
scripts denote derivatives. 


4. Suppose that U and V are open in R” and ®: V — U is a C% diffeomorphism. 
Explain how to define F o @ € D'(U) for any F € D' (V). 


5. Suppose that f is continuously differentiable on R except at £1, ..., £m, where 
f has jump discontinuities, and that its pointwise derivative df /dx (defined except 
at the z;’s) is in Lj, (IR). Then the distribution derivative f’ of f is given by 
f’ = (df/dx) + iy [f(2j+) — f(2j—)] 2,5. 


6. If f is absolutely continuous on compact subsets of an interval U C R, the dis- 
tribution derivative f’ € D’(U) coincides with the pointwise (a.e.-defined) derivative 
of f. 


7. Suppose f € L} (R). Then the distribution derivative f’ is a complex measure 


loc 


on R iff f agrees a.e. with a function F € NBV, in which case (f’, p) = f odF. 


8. Suppose f € LP(R”). If the strong LP derivatives O; f exist in the sense of 
Exercise 8 in §8.2, they coincide with the partial derivatives of f in the sense of 
distributions. 


9. A distribution F on R” is called homogeneous of degree A if F o S, = rò F for 
all r > 0, where S,.(x) = rz. 
a. 6 is homogeneous of degree —n. 
b. If F is homogeneous of degree A, then 0° F is homogeneous of degree A —|a]. 
c. The distribution (d/dz)[x(0,00)() log x] discussed in the text is not homo- 
geneous, although it agrees on R \ {0} with a function that is homogeneous of 
degree —1. 


10. Let f be a continuous function on R” \ {0} that is homogeneous of degree —n 
(i.e., f(rxz) = r~"f(ax)) and has mean zero on the unit sphere (i.e., f fdo = 0 
where ø is surface measure on the sphere). Then f is not locally integrable near the 
origin (unless f = 0), but the formula 


(PV (f), $) = lim f(x)o(x) dr (PE Ce") 


e—0 |z|>e 





290 ELEMENTS OF DISTRIBUTION THEORY 


defines a distribution PV(f) — “PV” stands for “principal value” — that agrees 
with f on R” \ {0} and is homogeneous of degree —n in the sense of Exercise 9. 
(Hint: For any a > 0, the indicated limit equals 


i TOHE) 9(0)] de + / fee) de, 


|z|>a 
and these integrals converge absolutely.) 


11. Let F bea distribution on R” such that supp( F) = {0}. 
a. There exist N € N, C > 0 such that for all 6 € CO,” 


(F,¢)|<C Y sup |O%¢(«)]. 


lal <N |z| <1 


b. Fix y € CX with y(x) = 1 for |z| < 1 and y(x) = 0 for |z| > 2. If 
p E CS, let k(x) = o(x)[1 — y(kzr)]. If O%d(0) = 0 for |a| < N, then 
“pk — O%¢ uniformly as k — oo for |a| < N. (Hint: By Taylor’s theorem, 
J8°8(2)| < Clæ|N+1-lal for ja] < N) 

c. If € CX and 0° (0) = 0 for |a| < N, then (F, ¢) = 0. 

d. There exist constants Ca (|a| < N) such that F = Dd lal<N Ca. 


12. Suppose À > n; then the function x + |z|~* on R” is not locally integrable 
near the origin. Here are some ways to make it into a distribution: 
a. If ọ € Co, let P be the Taylor polynomial of ¢ about x = 0 of degree k. 
Given k > AX — n — 1 anda > 0, define 


(Fa) = | (ole) - Phe@iatrde f oe) 
z\|<a xr|>a 
Then F* is a distribution on R” that agrees with |z|~* on R” \ {0}. 


b. If à ¢ Z and we take k to be the greatest integer < A — n, we can let a — oo 
in (a) to obtain another distribution F that agrees with |z|~* on R” \ {0}: 


(F, ġ) = fite) — P3(z)] jc] dz. 
c. Letn = 1 and let k be the greatest integer < À. Let 


f(z) l [(k — A) (1 — à) i (sgn z)f|e|7>_ if à > k, 
7 (—1)* 1 [(k S 1)!]7 1 (sgn z)" log |z| if\ = k. 


Then f € Li. (IR), and the distribution derivative f) agrees with |x|—* on 
R \ {0}. 

d. According to Exercise 11, the difference between any two of the distributions 
constructed in (a)—(c) is a linear combination of 6 and its derivatives. Which 
one? 
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13. If F € D’ and O;F = 0 for j = 1,...,n, then F is a constant function. 
(Consider f * p; where 7; is an approximate identity in C'S°.) 


14. For n > 3, define F, F* € Lj. (IR”) by 


2. -2\(2—n)/2 
iy 


B [z= 
ne) W,(2 — n) 


= Wy(2—n)’ 


where wn = 27"/?/T'(n/2) is the volume of the unit sphere, and let A be the 
Laplacian. 
a. AF*(xz) = €~"g(e—!x) where g(x) = nw7 (|x|? + 1)7(+2)/2, 
b. fg = 1. (Use polar coordinates and set s = r?/(r? + 1).) 
c AF =6. (F* — F in D’; use Proposition 9.1.) 
d. If 6 € CX, the function f = F x ¢ satisfies Af = ¢. 
e. The results of (c) and (d) hold also for n = 1 but can be proved more simply 
there. Forn = 2, they hold provided F, F are defined by F(x) = (271) ~? log |z| 
and F° = (47) ~" log(|z|? + €?). 


15. Define G on R” x R by G(z, t) = (Ant) —7/e-l#I"/4ty q y(t). 
a. (O, — A)G = 6, where A is the Laplacian on R”. (Let G*(z,t) = 
G(x, t)X(c,00) (t); then G: — Gin D’. Compute ((0; — A)G*, ¢) for 6 € CS, 
recalling the discussion of the heat equation in 88.7.) 
b. If € Co°(IR” x R), the function f = G x ¢ satisfies (0, — A) f = ¢. 


9.2 COMPACTLY SUPPORTED, TEMPERED, AND PERIODIC 
DISTRIBUTIONS 


If U is an open set in R”, the space of all distributions on U whose support is a 
compact subset of U is denoted by €’(U); as usual, we set €’ = €’(IR”). €’(U) turns 
out to be a dual space in its own right, as we shall now show. 

The space C'°(U) of C’ functions on U is a Fréchet space with the C topology 
— that is, the topology of uniform convergence of functions, together with all their 
derivatives, on compact subsets of U. This topology can be defined by a countable 
family of seminorms as follows. Let {Vm }7° bean increasing sequence of precompact 
open subsets of U whose union is U, as in Proposition 4.39; then for each m € N 
and each multi-index a we have the seminorm 


(9.6) IF ll(mjo] = sup |O%f(x)|. 
rEVm 


Clearly 0° f; — O° f uniformly on compact sets for all æ iff || f; — f |lim,a] — 9 for 
all m, œ; a different choice of sets Vm would yield an equivalent family of seminorms. 


9.7 Proposition. C° (U) is dense inC™(U),. 
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Proof. Let {Vn }$° be as in (9.6). For each m, by the C'°° Urysohn lemma we can 
pick Ym E CZ (U) with Ym = lon Vm. Ifo E€ C®(U), clearly ||bmb—| fmo,a] = 
0 provided m > mo; thus Ymo — ¢ in the C% topology. E 


9.8 Theorem. €'(U) is the dual space of C® (U). More precisely: If F € €'(U), 
then F extends uniquely to a continuous linear functional on C% (U); and if G is a 
continuous linear functional on C® (U), then G|C'3°(U) € € (U). 


Proof. If F € E€'(U), choose p € C3 (U) with y = 1 on supp(F), and define 
the linear functional G on C®(U) by (G,¢) = (F,W¢). Since F is continuous 
on CS°(supp(w)), and the topology of the latter is defined by the norms ¢ +> 
|O"d||4, by Proposition 5.15 there exist N € N and C > 0 such that |(G,¢)| < 
C lal <N ||O% (wd) ||u for p E C™°(U). By the product rule, if we choose m large 
enough so that supp(w) C Vm, this implies that 


KG olse S* sup agl) SC’ X [Nb lltmyey: 


la| <N xEsupp(y) la| <N 


so that G is continuous on C% (U). That G is the unique continuous extension of F 
follows from Proposition 9.7. 

On the other hand, if G is a continuous linear functional on C'°(U), by Propo- 
sition 5.15 there exist C, m, N such that |(G,¢)| < C So acy IlPll[m,aj for all 6 € 
C™°(U). Since ||A||fm,a} < ||O* lu, this implies that G is continuous on C° (K) for 
each compact K C U, so G|C9°(U) € D’(U). Moreover, if [supp(¢)] N Vm = Ø, 
then (G, ¢) = 0; hence supp(G) C Vm and GIC&°(U) € € (U). E 


The operations of differentiation, multiplication by C° functions, translation, 
and composition by linear maps discussed in §9.1 all preserve the class €’. As for 
convolution, there is more to be said. 

First, if F € € and ọ € CS then F * € CX, as Proposition 8.6d remains 
valid in this setting. Second, if F € €’ and y € C°, F * w can be defined as a C% 
function or as a distribution just as before: 


~ ~ 


F x y(x) = (F, TaY), (F*¥,6)=(F,b*v) (GE CO) 


(see Exercise 16). Finally, a further dualization allows us to define convolutions of 
arbitrary distributions with compactly supported distributions. To wit, if F € D’ and 
G € €’, we can define F * G € D’ and G x F € D' as follows: 


(F xG, ¢) =(F,G*¢), (G*F,¢ġ)=(G,F*¢)  (¢€ CS), 


and likewise for F. The proof that F' x G and G * F are indeed distributions (i.e., that 
they are continuous on C’o°) and that F x G = G x F requires a closer examination of 
the continuity of the maps involved. We shall not pursue this matter here; however, 
see Exercises 20 and 21. 

A notable omission from our list of operations that can be extended from functions 
to distributions is the Fourier transform F. The trouble is that F does not map C'?° 
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into itself; in fact, if ọ € CS, then- b cannot vanish on any nonempty open set 
unless ¢ = 0. To see this, suppose $ = = 0 on a neighborhood of £o. Replacing ¢ 
by e~ 27602 we may assume that £9 = 0. Since ¢ has compact support, we can 
expand e~27*6'* in its Maclaurin series and integrate term by term to obtain 


b(f) = >a = [(-2nie -2)* (x) dx = ` Lge [(-2ric)9(2) ax 
k=0 ` a 


(see Exercise 2a in §8.1). But [(—27ix)%$(z) dr = 8~4(0) for all a by Theorem 
8.22d. These derivatives all vanish by assumption, so b = 0 and hence ¢ = 0. 

However, we do have available a slightly larger space of smooth functions that is 
mapped into itself by F, namely, the Schwartz class $. We recall that 5 is a Fréchet 
space with the topology defined by the norms 


Illa) = sup (1 + fel) "18° 9(2)I. 


9.9 Proposition. Suppose Y € CS° and w(0) = 1, and let yf (£) = y(ex). Then 
for any ġ E S, yfo > ġ in S ase — 0. In particular, CÙ is dense in Ò. 


Proof. Given N € N, for any 7 > 0 we can choose a compact set K such that 
(1 + |x|)" |¢(z)| < nfor x ¢ K. Since y(x) — 1 uniformly for z € K as € — 0, 
it follows easily that |Y% — ¢||(v,.0) — 0 for every N. For the norms involving 
derivatives, we observe that by the product rule, 


(1 + |x|)" O° (Yo — ¢) = (1 + [2|)" (p*d%G — 0°4) + Elz), 
where Ee is a sum of terms involving derivative of Yf. Since 
|A°p*(z)| = €?!|°p(ex)| < Cel”, 


we have || Fe|lu < Ce — 0 as e — 0. The preceding argument then shows that 
lyo — All(n,a) ~> 9. B 


A tempered distribution is a continuous linear functional on $. The space of 
tempered distributions is denoted by S’; it comes equipped with the weak* topology, 
that is, the topology of pointwise convergence on 8. If F € 8’, then FIC% is 
clearly a distribution, since convergence in C° implies convergence in $, and F'|C'S° 
determines F uniquely by Proposition 9.9. Thus we may, and shall, identify S’ with 
the set of distributions that extend continuously from C° to 5. We say that a locally 
integrable function is tempered if it is tempered as a distribution. 

The condition that a distribution be tempered means, roughly speaking, that it 
does not grow too fast at infinity. Here are a few examples: 


e Every compactly supported distribution is tempered. 


e If f € Li. (R") and [(1 + |2|)*|f(x)|dx < co for some N, then f is 
tempered, for | f f| < Cll¢|l(o,1): 
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e The function f(x) = e°? on R is tempered iff a is purely imaginary. Indeed, 
suppose a = b + ic with b,c real. If b = 0, then f is bounded and hence 
tempered by (ii). If b 4 0, choose a function = € C’S° such that f w = 1, and 
let ø; (x) = e~°* (a — j). It is easily verified that 6; — 0 in S as j — +00 
(ifb > 0) or j -> —œ (if b < 0), but f fọ; = f y = 1 forall j. 


e On the other hand, the function f(x) = e” cos e7 on R is tempered, because it 
is the derivative of the bounded function sin e”. Indeed, if ¢ € S, integration 
by parts yields 


[18 = l- [Vesne ax 


Intuitively, f(x) is not too large “on average” when z is large, because of its 
rapid oscillations. 





< Cllelke,1- 


We turn to the consideration of the basic linear operations on tempered distri- 
butions. The operations of differentiation, translation, and composition with linear 
transformations work just the same way for tempered distributions as for plain dis- 
tributions; these operations all map 5 and S’ into themselves. The same is not true 
of multiplication by arbitrary smooth functions, however. The proper requirement 
on Y% € C” in order for the map F — wF to preserve S and S’ is that w and all its 
derivatives should have at most polynomial growth at infinity: 


|O%p(x)| < Call + le^ for all a. 


Such C% functions are called slowly increasing. For example, every polynomial 
is slowly increasing; so are the functions (1 + |z|?) (s € R), which will play an 
important role in the next section. 

As for convolutions, for any F € 8! and w € S we can define the convolution 


F xp by F * y(x) = (F, Tz), as before, and we have an analogue of Proposition 
9.3: 


9.10 Proposition. If F € 8’ and w € S, then F * w is a slowly increasing C% 
function, and for any ¢ € 8 we have [(F x y) = (F, ġ * Y). 


Proof. That F x p € C™ is established as in Proposition 9.3. By Proposition 
5.15, the continuity of F' implies that there exist m, N, C such that 


KELSE X |Idllanay  (¢€8), 


lal SN 
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and hence by (8.12), 


|F x yle) <C XC sup(1 + |yl)™|O°Y(x — y)| 


lain 4 
< C(1+ |e)” $D supli + |e- y|)" laty — y)| 
lal<n 4 
C(1 + |z|)” ` Pll @m,a): 
la|<N 


The same reasoning applies with 7) replaced by 0°, so F * w is slowly increasing. 
Next, by Proposition 9.3 we know that the equation [(F' * y) = (F,¢ * p) holds 
when ¢, y € CX. By Proposition 9.9, if ġ, Y € S we can find sequences {ġ;} and 
{;} in Co° that converge to ġ and ~ in S$. Then 1 $j x j — ġ * 4 in S by (the proof 
of) Proposition 8.11, so (F, d; * pj) > — (F, px w). On the other hand, the preceding 
estimates show that |F * w,;(x)| < C(1 + |z|)” with C and m independent of J, 
and likewise |¢;(x)| < C(1 + |z|)~"~-"—1, so [(F x ¥j)6; — [(F * w)¢ by the 
dominated convergence theorem. E 
Finally, we come to the principal raison d’être of tempered distributions, the 
Fourier transform. We recall (Corollary 8.23) that the Fourier transform maps s 
continuously into itself, and that for f, g € L? (in particular, for f, g € $) we have 


/ f(y)g(y) dy = / i f(z)g(yje°"'*Y da dy = J f(x)g(x) dx 


We can therefore extend the Fourier transform to a continuous linear map from S’ to 
itself by defining 


(F p) = (F, (Fes, ges). 

This definition clearly agrees with the one in Chapter 8 when F € L! + L?. 

The basic properties of the Fourier transform in Theorem 8.22 continue to hold in 
this setting. To wit, 

(yF) Sern, Ê = erer], 
O°F = |(-2riz)“F],  (O°F) = (27i€)*F, 
(foT) =|detT|fo(T*)"! (T € GL(n,R)), 
(F*d) =YF (we). 

(The first four of these formulas involve products of slowly increasing C° functions, 
specified by their values at a general point x or £, and tempered distributions.) The 


easy verifications of these facts are left to the reader (Exercise 17). 
Moreover, we can define the inverse transform in the same way: 


(FY, $) = (F,o°). 








296 ELEMENTS OF DISTRIBUTION THEORY 


m~ om 


The Fourier inversion theorem formula ¢ = (¢)Y = (¢”) then extends to S’: 


(F)Y, p) = (F, p“) = (F, (9°) ) = (F9), 
so that (F)Y = F, and likewise (FY) = F. Thus the Fourier transform is an 
isomorphism on 8’. o 
If F € €’, there is an alternative way to define F. Indeed, (F, ¢) makes sense for 
any ġ € C™, and if we take ọ(x) = e~27*6'7, we obtain a function of £ that has a 
strong claim to be called F(€ ). In fact, the two definitions are equivalent: 


9.11 Proposition. [f F € €’, then Fisa slowly increasing C% function, and it is 
given by F (£) = (F, E_¢) where E¢(x) = e?" 7, 


Proof. Let 9(€) = (F, E-e). Consideration of difference quotients of g, as in 
the proof of Proposition 9.3, shows that g is a C% function with derivatives given 
by 0% 9(€) = (F, OF E_¢) = (—27i)!¢l(F, r% E-e}. Moreover, by Theorem 9.8 and 
Proposition 5.15, there exist m, N, C such that 


3 gE SC SY > sup [O° [2° E_¢(x)]| < CAm) + (el), 
als els 


so g is slowly increasing. J 

It remains to show that g = F, and by Proposition 9.9 it suffices to show that 
J 9¢ = (F, p) for d € CS. In this case gf € CX, so f gh can be approximated 
by Riemann sums as in the proof of Proposition 9.3, say $` 9(&;)¢(€;) A€;. The 
corresponding sums J` (€;)e~27"5s'* AE, and their derivatives in x converge uni- 
formly, for x in any compact set, to (x) and its derivatives. Therefore, since F' is a 
continuous functional on C, 


Jo = lim X` (F, E-e, )olE;) Ag; a lim(F, X b(2;)E-, Ag; ) = T p). 


It is time for some examples. First and foremost, the Fourier transform of the 
point mass at 0 is the constant function 1: (6, E_¢) = E_¢(0) = 1. More generally, 
for point masses at other points and their derivatives, we have 


(8% 7y6) (€) = (—1)!41(6, r_yO*B_¢) = (—1)!*1 82 (e7? (+9) | 


Smee. 


D= 


In particular: 


9.12 Proposition. The Fourier transforms of the linear combinations of and its 
derivatives are precisely the polynomials. 
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The Fourier inversion theorem then yields the formulas for the Fourier transforms 
of polynomials and imaginary exponentials: 


(9.13) (x*) = [(—2)*]¥ = (—2mi)~!*10%6, Ey (HS. 


As an illustration of the heuristics associated to these results, consider the formula 


J er Mh z qE = § (xr). 


Although this is nonsensical as a pointwise equality, it is valid when viewed from the 
right angle. One the one hand, it expresses the fact that the Fourier transform of the 
constant function 1 is 6. More interestingly, it is a concise statement of the Fourier 
inversion theorem. Indeed, if we replace z by x — y, integrate both sides against 
o € ò, and reverse the order of integration on the left, we obtain 


J| oe dy ax = [82 - wou) dy. 


m~ 


The integral on the left is (ġ)“ (x), and the integral on the right equals ¢(z)! 

It is an important fact that every distibution is, at least locally, a linear combination 
of derivatives of continuous functions. The Fourier transform yields an easy proof of 
this: 


9.14 Proposition. 
a. If F € €, there exist N € N, constants ca (la| < N), and f € Co(IR”) such 
that F = } ` ajc y Ca? f. 
b. IfF € D'(U)andV is a precompact open set with V C U, there exist N, Ca, f 
as above such that F = X aj<N CaO" f on V. 


Proof. By Proposition9.11,if F € €’ then Fis slowly increasing, so the function 
g(E) = (1 + |€|?)-™ F(€) will be in Lt if the integer M is chosen sufficiently large. 
Let f = J; then f € Co and F = (1 + |€|2)"f, so F = (I — (4r?) -1 7? O: 
This proves (a); for (b), choose ~ € CSS (U) such that ~ = 1 on V, and apply (a) to 
pF. E 


We conclude this section with a sketch of the theory of periodic distributions; 
some of the details are fleshed out in Exercises 22-24. 

The space C%(T”) of smooth periodic functions is a Fréchet space with the 
topology defined by the seminorms ¢ +> ||O%¢||,,, and a distribution on T” is a 
continuous linear functional on this space; the space of distributions on T” is denoted 
by D'(T”). If F € D'(T”), its Fourier transform is the function F on Z” defined by 
f(«) = (F, Ep} where E,,(x) = e?"**'", Since F satisfies an estimate of the form 
MELO) < Cv aren 10° Sllu, there exist C, N such that 


(9.15) |F(«)| < Ca +h”, 
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and the Fourier transform is an isomorphism from D’(T”) to the space of all functions 
on Z” satisfying such an estimate. Moreover, if F € D’(T”), the Fourier series 
ae F'(«)E,, converges in D’(T”) to F. 

Instead of defining periodic distributions as distributions on T” (linear functionals 
on C®(T”)), one can define them as distributions on R” (linear functionals on 
C'S°(IR™)) that are invariant under the translations Tk, « E€ Z”. Accordingly, let 


D'(R")per = {F € D'(R"): TyF = F fork € Z}. 


The periodization map Po = }_ pez» T. used in Theorem 8.31 is easily seen to map 
Cs? IR”) continuously into C°(T”), so it induces a map P’ : D'(T”) — D’(R”) 
given by (P’F, ¢) = (F, Po). Since Pot, = P for k € Z”, we have Tk o P’ = P’, 
that is, the range of P’ lies in D’(R”)per. In fact, P’ : D’(T”) — D’(R”)per is a 
bijection. (The proof is nontrivial; see Exercise 24.) Moreover, if f € L1(T”), then 
f and P’f coincide as periodic functions on R, for if 6 € C'S°(IR”), 


(PA= PA= j IAZ- 
=E f Tods | Nede =A 


Thus the two descriptions of periodic distributions are equivalent. 

If F € D'(T”), the Fourier series $ F'(«)E,, converges in D'(T”) to F; on the 
other hand, it follows easily from (9.15) that it also converges in S’(IR”), and its sum 
there is P’ f. Thus D’(R”) per C 8’(IR”), and by (9.13) we have 


= NO F(«) E = XO F()TK6, 


giving the relation between the R”- and T”-Fourier transforms for periodic distribu- 
tions. In particular, if F = ôr», the point mass at the origin in T”, then F(x) = 1 
for all x; hence P’F and (P’F) are both equal to $` 7,6 — a restatement of the 
Poisson summation formula. 


Exercises 


16. Suppose F € €’ and Y € C™. Show that for any ¢ € CX, | (F, Tz) b(x) dx = 
(F, dx). (The result can be reduced to Proposition 9.3; given F and ¢, the indicated 
expressions depend only on the values of Y in a compact set.) 
17. Suppose that F € S’. Show that 

a. (yF) =e289F 7, F = [een zFÍ. 

b. 02 F = [(- drix)e F|, (3C FY = (2ri£)e F 

c. (Fo T) = | det T|- Fo (T*)-! forT € GL(n,R). 

d. (F x Yj = YF for y EÒ. 
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18. If mn = l +m, let us write x € R” as (y,z) with y € R! and z € R™. Let 
F denote the Fourier transform on R” and F1, Fa the partial Fourier transforms 
in the first and second sets of variables — i.e., Fi f(n,z) = f f(y, ze TTY dy 
and likewise for Fa. Then F; and F% are isomorphisms on S(R”) and S’(R”), and 
oF = IFoo =H FoF jz 


19. On R, let Fo = PV(1/z) as defined in Exercise 10. Also, for € > O let 
F(x) = x(x? + €?)—1, G} (x) = (x + ie), and S,(x) = e~27¢l#l sgn z. 
a. lim,_.9 Fe = Fo in the weak* topology of 5’. (Theorem 8.14, with a = 0, 
may be useful.) 
b. lime—o Ge = Fo F nid. (Hint: (x + ie)—! = (x F ie)(x? + €2)-1) 
c S. = (mi)! F; and hence Sgn = (mi) t Fo. 
d. From (c) it follows that Fo = —7zisgn. Prove this directly by showing 
that Fo = lime4o, Noo Hen, where He y(x) = 27! if e < |z| < N and 
H, n(x) = 0 otherwise, and using Exercise 59b in §2.6. 
e. Compute X(0,.0) (i) by writing X(0,%) = + sgn +3 and using (c), (11) by 
writing X(0,00) (©) = lim e~*X(0,00)(Z) and using (b). 


20. Suppose that F € S’ and G € €. 
a. FG is well-defined element of $’. 
b. If% € S, then G* yY €S. 
c. Let F « G (or G » F) be the tempered distribution such that (F * G) = FG. 
Then (F x G, y) = (F,G x y) = (G, F * p) fory € S. 


21. Suppose that F, G, H e€ Y. 
a. If at most one of F, G, H has noncompact support, then (F x G) x H = 
F x (G x H), where the convolutions are defined as in Exercise 20. 
b. On R, let F be the constant function 1, G = dé/dz, and H = x(9,.0). Then 
(F x G)» H and F * (G * H) are well defined in S5 but are unequal. 


22. Let E(x) = e?"**, If g: Z” — C satisfies |g(«)| < C(1 + |s|)" for some 
C, N > 0, then the series $ pez» g(£)Ek converges in D'(T”) to a distribution 


F that satisfies F = g. It also converges in S'(R”) to a tempered distribution G 
(= P’F) such that 7,,G = G for all x. 


23. Suppose that F,G € D’(T”). 


a. There is a unique F x G € D’(T”) such that (F « G) = FG. (Use Exercise 
22) 
b. If G € C%(T”), then F *G € C™(T”) and F * G(x) = (F, TG} as on R”. 


24. Let P be the periodization map, Pọ = J nez TD. 
a. P is a continuous linear map from C% (R”) to C%°(T”). (Note that for 
h € CS and z in a compact set, only finitely many terms of the series X` T(z) 
are nonzero.) 
b. Choose y € CF with f y = 1, and let w = y * x(o,1)n. Then w € C'S? and 
Po), 
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c. Ify € C~(T”), then y = P(ww) (where w is regarded as a function on T” 
on the left and as a function on R” on the right). Consequently, P : Co°(IR”) — 
C'(T”) is surjective and the dual map P’ : D’(T”) — D’(R”) per is injective. 
d. Given G € D’(R”)per, define F € D'(T”) by (F, Y) = (G, wy) (with the 
same understanding as in part (c)). Then P’F = G, so P’ maps D’(T”) onto 
D'(R”) per- 
25. Suppose that P is a polynomial in n variables such that only zero of P(€) in R” 
is € = 0, and let P(D) be as in §8.7. 
a. Every tempered distribution F' that satifies P(D)F = 0 is a polynomial. (Use 
Proposition 9.12 and Exercise 11.) 
b. Every bounded function f that satisfies P(D)f = 0 is a constant. (This 
result, for the special cases where P( D) is the Laplacian or the Cauchy-Riemann 
operator ô; + id, on R?, is known as Liouville’s theorem.) 


26. On R” x R, let G(x, t) = (4nt)—"/2e7l#l’/4ty 9 y(t). 
a. G is the tempered function G(€,7) = (2rir + 4n2|€|2)—!. (Use Proposition 
8.24 and Exercise 18.) 
b. Deduce that (0, — A)G = 6. (Cf. Exercise 15.) 


27. Suppose that 0 < Rea <n. 
a. For any ¢ € ©, 


r((n — a)/2) a-no r(a/2) —Q 
paa f leeg ew feo 


(Hint: By Proposition 8.24 and Lemma 8.25, if t > 0 we have 
fet ba) ar =" fect? ae) ag 


Multiply both sides by t~!+("-%/? dt and integrate from 0 to oo.) 

b. Let Ra(z) = r((n — a) /2)[T(a/2)2° ceca l|z|*7"”, Then Ra is a tem- 
pered function and Ra is the tempered function Ra (£) = (27|é|)~% 

c. If n > 2, then ARy = —6. (Cf. Exercise 14. See the next ee for the 
case n = 2.) 


28. Suppose n = 2. For 0 < Rea < 2, let cg = I'((2 — a)/2)(P(a/2)2%7]-? 
and Qa(r) = Ca(|é|*~? — 1). (Note that Qa differs by a constant from the Ra in 
Exercise 27.) 
a. lime _.2 Qa(r) = —(27)~! log |x|, pointwise and in 8’. 
b. By (a), lima—2 Qa exists in 8’, and by Exercise 27b, Qa (£) = (27|€|)~% — 
Caô. Noting that (27|£|)~? is not integrable near the origin and that lima—2 Ca = 
oo, find an explicit formula for limg_.2 Ox (Exercise 12 may help.) 


29. For 1 < p < œ, let C, be the set of all F € S’ for which there exists C > 0 
such that ||’ * d||p < C||||p for all ¢ € S, so that the map d +> F * @ extends toa 
bounded operator on L?. 
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a. Cı = M(R"). (If F € C1, consider F x i where {¢;} is an approximate 
identity, and apply Alaoglu’s theorem.) 

b. Co ={F ES’: Fe L®}. (Use the Plancherel theorem.) 

c. If p and q are conjugate exponents, then €p = C,. (Hint: (F * ¢,) = 
(F +d, 9). 

d. If 1 < p < 2 and q is the conjugate exponent to p, then €, C €, for all 
r € (p,q). (Use the Riesz-Thorin theorem.) 

e. Cı C C, C Co forall p € (1, 00). 


9.3 SOBOLEV SPACES 


One of the most satisfactory ways of measuring smoothness properties of functions 
and distributions is in terms of L? norms. There are two reasons for this: L? has 
the advantage of being a Hilbert space, and the Fourier transform, which converts 
differentiation into multiplication by the coordinate functions, is an isometry on L?. 

As a first step, suppose k € N and let H, be the space of all functions f € L?(R”) 
whose distribution derivatives 0° f are L? functions for |a| < k. One can make Hy 
into a Hilbert space by imposing the inner product 


fae Y / (8° f(E). 
lal <k 


However, it is more convenient to use an equivalent inner product defined in terms of 
the Fourier transform. Theorem 8.22e and the Plancherel theorem imply that f € Hk 
iff Eef € L? for |a| < k. A simple modification of the argument in the proof of 
Proposition 8.3 shows that there exist C1, C2 > 0 such that 


Ci (1+ |e)" < S et? < Ca(1 + le)", 


Ja] <k 
from which it follows that f € Hp iff (1 + lg|2)*/2 fF € L? and that the norms 
A 1/2 BS 
fo (Y laes) and fs aA 
Ja|<k 


are equivalent. The latter norm, however, makes sense for any k € R, and we can 
use it to extend the definition of Hx to all real k. 

We proceed to the formal definitions. For any s € R the function £ + (1+ |g|?)9/? 
is C' and slowly increasing (Exercise 30), so the map A, defined by 


Asf = [(1+leP) PF)” 


is a continuous linear operator on S’ — actually an isomorphism, since Ay! = ALs. 
If s € R, we define the Sobolev space H, to be 


Alfen eL, 
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and we define an inner product and norm on H, by 


oo = [AATED = | ROEP) TE ae, 
m s 11/2 
lfl = lsti = [/ IROPO +EP) ae] 


(The equality of the two formulas for (f,9)(,) and for || f]|(s) follows from the 
Plancherel theorem.) Note that the inner products (-,-)(s) are conjugate linear in 
the second variable, but we are continuing to use the notation (-, -) for the bilinear 
pairing between S’ and S. This will cause no confusion, since we shall not be using 
the inner products (-, -)(s) explicitly. 

The following properties of Sobolev spaces are simple consequences of the defi- 
nitions and the preceding discussion: 


i. The Fourier transform is a unitary isomorphism from H, to L? (R” , ps) where 
dus(£) = (1 + |€|?)° dé. In particular, H, is a Hilbert space. 


li. S is a dense subspace of H, for all s € R. (This follows easily from (i) and 
Proposition 8.17.) 


iii. Ift < s, Hs is a dense subspace of H; in the topology of H+, and ||-||(¢) < Ill- 
iv. Aç is a unitary isomorphism from H, to H,_, for all s,t € R. 
v. Ho = L* and ||- ||(o) = || - ||2 (by Plancherel). 


vi. ©% is a bounded linear map from H, to Hs_j, for all s, œ (because |€*| < 
(Teper): 


By (iii) and (v), for s > 0 the distributions in H, are L? functions. For s < 0 
the elements of H, are generally not functions. For example, the point mass 6 is 
in H, iff s < —żn, for 6 is the constant function 1, and f + |El?) dE < œ 
iff s < — $n. Another example: The distribution W; whose Fourier transform is 
(27|€|)—+ sin 27t|€|, which arose in the discussion of the wave equation in §8.7, is 
in H, iff s < 1 — 4n; it iş in L? N L? when n = 1 and in L} \ L? for n = 2, but is 
not a function for n > 3. 


9.16 Proposition. If s € R, the duality between S’ and S induces a unitary isomor- 
phism from H_, to (H,)*. More precisely, if f € H_s, the functional ¢ > (f, ¢) 
on 8 extends to a continuous linear functional on H, with operator norm equal to 
|| f \l(-s), and every element of (Hs)* arises in this fashion. 


Proof. If fe H_,and¢€é5§, 


Taat os J FERE) d£ 
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=~ 


since fY(€) = f(—E&) is a tempered function. Thus by the Schwarz inequality, 


1/2 
Kf ns [J mop (EIEN J iaa (1 + 1E?) as] 
= NlFll—s) llel) 


so the functional ¢ + (f, $} extends continuously to Hs, with norm at most || || ;_s). 
In fact, its norm equals || f||(_s), since if g € S is the distribution whose Fourier 


transform is g(€) = (1 + JEI- f (2), we have g € Hs and 


Goa fiñ FOA + BE dé = IF-o = Iflos loll). 


Finally, if G € (H,)*, then G o F—! is a bounded linear functional on L? (us) where 
dus (£) = (1 + |El”): dé, so there exists g € L?(us) such that 


= | HEEN + EP)” a 
But then G(¢) = (f, o) where fY (€) = (1 + |E|?)*9(€), and f € H_, since 


Ifl? = Jie FOP (1 + E) de = [ise io(€)2(1 + El) d 


For s > 0, the elements of H, are L? functions that are ““L?-differentiable up 
to order s,’ and it is natural to ask what is the relationship between this notion of 
smoothness and ordinary differentiability. Of course, if one thinks of elements of H, 
as distributions or elements of L?, there is no distinction among functions that agree 
almost everywhere; from this perspective, when one says that a function in H, is of 
class C*, one means that it agrees a.e. with a CF function. With this understanding, 
the question just posed has a simple and elegant answer. We introduce the notation 


= { f e C*(R”) : O°f € Cp for |a| < k}. 
Ge is a Banach space with the CF norm f > D lO? fllu. 


9.17 The Sobolev Embedding Theorem. Suppose s > k + in. 


a. If f € Hs, then (aef) € L! and Iae fY |l < Cl fllcs) for |a| < k, where C 
depends only on k — s. 
b. H; C Cx and the inclusion map is continuous. 


Proof. By the Schwarz inequality, 
(anytel f patea = fier Flolaes f 0+ EP PRON 


< [farea] S aea] 
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The first factor on the right is || f||;,), and the second one is finite by Corollary 2.52 
since 2(k — s) < —n. This proves (a), and (b) follows by the Fourier inversion 
theorem and the Riemann-Lebesgue lemma. E 


9.18 Corollary. If f € Hs for all s, then f € C™. 


An example may help to elucidate this theorem. Let f(x) = ¢(x)|x|^, where 
A € R and ġ € CY with ¢ = 1 on a neighborhood of 0. Then the (classical) 
derivative 0° f, is C% except at 0 and is homogeneous of degree A — |a| near 0, so 
that |O% f| < Cy,,{a|*7!¢!, and in particular 0% fy € L! provided À — |a| > —n. 
In this case 0° fy, as an Lt function, is also the distribution derivative of fy. (To 
see this, replace fy by the C% function (x (lel? + €?)ò/2 and consider the limit as 
e— 0.) Moreover, 0° fy € L? iff À — la| > —żn, so f € Hy (k = 0,1,2,...) iff 
A>k- in, whereas f, € OŠ iff A > k. See kA Exercises 33—35 for some related 
results. 

Next, we show that multiplication by suitably smooth functions preserves the H, 
spaces. We need a lemma: 


9.19 Lemma. For all £, n € R” ands E€ R, 
(1+ 161P)? (1 + I?) < 261 + E- nf?) 
Proof. Since |€| < |E — n| + Inl, we have ||? < 2(|€ — n|? + |n|?) and hence 
1+ |E]? < 2(1 + JE — nl?) (1+ In’). 


If s > 0, we have merely to raise both sides to the sth power. If s < 0, we interchange 
€ and 7 and replace s by —s, obtaining 


(+P E <27-8(1 4 |e?) “(1+ E- nl), 


which is again the desired result. E 
9.20 Theorem. Suppose that ¢ € Co(IR") and that $ is a function that satisfies 


SOA EBPO = 0 <o 


for some a > 0. Then the map Mg(f) = of is a bounded operator on H, for 
Is| <a. 


Proof. Since Ag is a unitary map from H, to Ho = L?, it is equivalent to show 
that A,M3A_, is abounded operator on L?. But 


(AsMyA_of) (€) = (1 + EP) E= (A-SI) = J K(€, n) Fn) dn, 
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where : om 
K(E,n) = (1+ 16?) (14 In?) RE - n). 
By Lemma 9.19, 


IK(E,n)| < 28/2 (1 + fe — nf?) Pele — n), 


so if |s| < a, then f |K(E,)| dé and f |K(£,n)|dn are bounded by 27/2C. That 
A;MgA_s is bounded on L? therefore follows from the Plancherel theorem and 
Theorem 6.18. E 


9.21 Corollary. If ¢ € S, then Mg is a bounded operator on Hs for all s € R. 


Our next result is a compactness theorem that is of great importance in the appli- 
cations of Sobolev spaces. 


9.22 Rellich’s Theorem. Suppose that {fx} is a sequence of distributions in Hs 
that are all supported in a fixed compact set K and satisfy sup, || fr ||(s) < 00. Then 
there is a subsequence { fy, } that converges in H, forall t < s. 


Proof. First we observe that by Proposition 9.11, fe is a slowly increasing C'° 
function. Pick ġ € C'S such that ¢ = 1 ona neighborhood of K. Then fk = fx, so 
fe = = Q x fi where the convolution is defined pointwise by an absolutely convergent 
integral. By Lemma 9.19 and the Schwarz inequality, 


(1+ 1P) PIRO] 
a i BE — MI lE = nf) PIRI + In?) 


< 2!9/? |) Silas ll fells) < constant. 


Likewise, since O;(¢ * fe) = (0; ¢) « fr, we see that (1 + \é|?)8/210; fe(€)| is 
bounded by a constant independent of €, 7, and k. In particular, the fe ’s and their first 
derivatives are uniformly bounded on compact sets, so by the mean value theorem 
and the Arzela-Ascoli theorem there is a subsequence {fr ; } that converges uniformly 
on compact sets. 

We claim that { fx, } is Cauchy in H; for all t < s. Indeed, for any R > 0 we can 
write the integral 


ete J (1+ EDIR, — A PE a 


as the sum of the integrals over the regions |€| < R and || > R. For || < R we use 


the estimate 
max(t,0) 


(ier) < (14 RTZ, 
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and for |€| > R we use the estimate 
(1+léP)) s+)“ (1+ le?) 
which yield 


max(t,0) 


fe. — fell?) < CR" (1 + R?) ue fis — fu, (P(E) 


t—s 
+(1+ RË) Mfr — Ses Key 
Given € > 0, the second term will be less than łe provided R is chosen sufficiently 
large, since t— s < 0; once such an Fis fixed, the first term will less than le provided 
i and 7 are sufficiently large. The proof is therefore complete. E 


Although the definition of Sobolev spaces in terms of the Fourier transform entails 
their elements being defined on all of R”, these spaces can also be used in the study 
of local smoothness properties of functions. The key definition is as follows: If U is 
an open set in R”, the localized Sobolev space H!°° (U) is the set of all distributions 
f € D'(U) such that for every precompact open set V with V C U there exists 
g € H, such that g = f on V. 


9.23 Proposition. A distribution f € D'(U) is in H! (U) iff ¢f € H, for every 
$ € CX (U). 


Proof. If f €e H'°°(U) and ¢ € C&°(U), then f agrees with some g € Hs 
on a neighborhood of supp(¢); hence ¢f = ¢g € H, by Corollary 9.21. For the 
converse, given a precompact open V with V C U, we can choose ¢ € C(U) 
with ¢ = 1 on a neighborhood of V by the C% Urysohn lemma; then df € H, 
and @f = f on V. (We have implicitly used Proposition 4.31 to obtain compact 
neighborhoods of supp ¢ and V in U.) E 


We conclude this section with one of the classic applications of Sobolev spaces, a 
regularity theorem for certain partial differential operators. 

If L = X `o a;(d/dz) is an ordinary differential operator with C% coefficients 
such that a,, never vanishes, it is not hard to show that smooth data give smooth 
solutions. More precisely, if Lu = f and f is CF on an open interval J, then u is 
C*®*+™ on I. No such result holds for partial differential operators in general. For 
example, for any f € Li (R) the function u(x,t) = f(x — t) satisfies the wave 
equation (0? — 02)u = 0, but u has only as much smoothness as f. However, there 
is a large class of differential operators for which a strong regularity theorem holds. 
We restrict attention to the constant-coefficient case, although the results are valid in 
greater generality. 

Let P(D) = Zalen CoD“ (notation as in §8.7) be a constant-coefficient opera- 
tor. We assume that m is the true order of P(D), i.e., that cg Æ 0 for some a with 
la| = m. The principal symbol P,, is the sum of the top-order terms in its symbol: 


Fe cere. 


la|=m 
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P(D) is called elliptic if Pm (£) 4 0 for all nonzero € € R”. Thus, ellipticity means 
that, in a formal sense, P(D) is genuinely mth order in all directions. (For example, 
the Laplacian A is elliptic on R”, whereas the heat and wave operators ô; — A and 
ð? — A are not elliptic on R"*?.) 


9.24 Lemma. Suppose that P(D) is of orderm. Then P(D) is elliptic iff there exist 
C, R > 0 such that |P(€)| > C|E|™ when |€| > R. 


Proof. If P(D) is elliptic, let C} be the minimum value of the principal symbol 
Pm on the unit sphere || = 1. Then C > 0, and since Pm is homogeneous of 
degree m, we have |Pm(E)| > C,|€|™ for all £. On the other hand, P — Pm is of 
order m — 1, so there exists Cp such that |P(£) — Pm(£)| < Co|€|"—1. Therefore, 


IP(E)| > |Pm(€)| — P(E) — Pn(€)] 2 gCrlél” for |£] = 20207. 


Conversely, if P(D) is not elliptic, say Pm (ĉo) = 0, then |P(£)| < Clé|™—?! for 
every scalar multiple € of £o. E 


9.25 Lemma. If P(D) is elliptic of order m, u € Hs, and P(D)u € H,, then 
WE Haga: 


Proof. The hypotheses say that (1 + |€|?)$/2a@ € L? and (1 + |€|?)8/2 Pa € L?. 
By Lemma 9.24, for some R > 1 we have 


m/2 mi¢ejm -lom 
(1+ £l?) < 2e” < C712™|P(E)| for €| > R, 
and (1 + |é|?)/? < (1 + R?)™/? for |€| < R. It follows that 
(1+ |e?) Or ay < e(a + é (Pal + al) e z, 


that is, u E€ Hs4m. E 


9.26 The Elliptic Regularity Theorem. Suppose that L is a constant-coefficient 
elliptic differential operator of order m, Q is an open set in R”, and u € D' (Q). 
If Lu € H! (Q) for some s € R, then u € H!®„ (Q); and if Lu € C®(Q), then 
u E C (Q). 


Proof. The second assertion follows from the first in view of Corollary 9.18, so 
by Proposition 9.23 we must show that if Lu € H! (Q) and ¢ € C (Q), then 
pu € Hs4m. Let V be a precompact open set such that supp(¢) CV C V CQ, 
and choose p € C®(NQ) such that Y = 1 on V. Then wu € €’, so it follows 
from Proposition 9.11 that wu € H, for some o € R. By decreasing o we may 
assume that s + m — ø is a positive integer k. Set Yo = Y and Yk = ¢, and choose 
recursively ~1,...,We—1 E Co? such that y; = 1 on a neighborhood of supp(¢) 


and supp(w;) is contained in the set where ~;_; = 1. We shall prove by induction 
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that pju € Hoj. When j = k, we obtain gu = ypu € Hok = Hm, which will 
complete the proof. 
The crucial observation is that for any ¢ € CX the operator |L, ¢] defined by 


[L, f= L(¢f) — SLF 


is a differential operator of order m — 1 whose coefficients are linear combinations of 
derivatives of ¢; in particular, these coefficients are C' functions that vanish on any 
open set where ¢ is constant. (This follows from the product rule for derivatives.) 
Thus, if f € H;, we have 0° f € Hi-(m-1) for |a| < m — 1 and hence [L,¢]f € 
H;-(m-1) by Theorem 9.20. 

For j = 0 we have wou € Ho by assumption. Suppose we have established that 
pju € Hopi, where O < j < k. Then by the preceding remarks, 


L(pj+i1u) = pjp Lu + [L, Yjilu = vjypi Lut [L, dj +i) dju 
€ A, F Haroen) z ieee re eres 


Since Wj41u = Yjyyi~j;u E€ Hoj, Lemma 9.25 (with P(D) = L) implies that 
pju E Ho4541, and we are done. E 


Two classical special cases of this theorem are particularly noteworthy. First, 
every distribution solution of Laplace’s equation Au = 0 is a C% function. (This 
fact is known as Weyl’s lemma.) Second, if L = 0, + iô» on R?, the equation 
Lu = 0 is the Cauchy-Riemann equation, whose solutions are the holomorphic 
(or analytic) functions of z = zı + i£2. We thus recover the fact that holomorphic 
functions are C°. 


Exercises 
30. Let fs(€) = (1 + |€|?)8/?. Then l f,(€)| < Ca(1 + |él)*7!*". 


31. If k € N, Hx is the space of all f € L? that possess strong L? derivatives 0° f, 
as defined in Exercise 8 in 88.2, for |a| < k; and these strong derivatives coincide 
with the distribution derivatives. 


32. Suppose r < s < t. For any € > 0 there exists C > 0 such that || flls < 
ell fllcey + Cll fll(ry for all f © H. 


33. (Converse of the Sobolev Theorem) If H, C Cf, then s > k + $n. (Use the 
closed graph theorem to show that the inclusion map H, — CÈ is continuous and 
hence that 0°6 € (H,)* for |a| < k.) 


34. (A Sharper Sobolev Theorem) For 0 < a < 1, let 


Aa(R") = ff € BC(R") : sup Lz) - FW) < o}. 
TÆY |x — y| 
n +a where 0 < a < 1, then ||726 — Tyôl|(-s) < Cale — y|*. (We 
) = e7?™&2, Write the integral defining ||Tz6 — 7,6||?_,) as the 
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sum of the integrals over the regions |€| < Rand || > R, where R = |z — y|7}, 
and use the mean value theorem to estimate (7,6 — 7,6) on the first region.) 
b. If s = $n +a where 0 < a < 1, then Hs C A(R”). 


c. Ifs= in +k +a where k € Nand0 <a < 1, then 


Hs C {f € Co : O°f € A(R”) for |a| < k}. 


35. The Sobolev theorem says that if s > in, it makes sense to evaluate functions in 
H, ata point. For 0 < s < in, functions in H, are only defined a.e., but if s > 5k 
with k < n, it makes sense to restrict functions in H, to subspaces of codimension 
k. More precisely, let us write R” = R”~* x RF, x = (y, z), € = (n, C), and define 
R : 8(R") > S(R"™) by Rf (y) = f(y, 0). 

a. (Rf) (n) = f f(n, ¢) dé. (See Exercise 20 in §8.3.) 

b. If s > $k, 


(RESMI < Ca(1 + Ink?) J Fin, OP A+ In? + A ae. 


c. R extends to a bounded map from H,(R”) to Hs-(k 72)(R"~*) provided 
s> 5k. 
36. Suppose that 0 4 ¢ € C'S and {a; } is a sequence in R” with |a;| — oo, and let 


p(x) = (x — aj). Then {¢;} is bounded in Hs for every s but has no convergent 
subsequence in H; for any t. 


37. The heat operator 0, — A is not elliptic, but a weakened version of Theorem 
9.26 holds for it. Here we are working on R”*! with coordinates (x,t) and dual 
coordinates (£, T), and 0, — A = P(D) where P(€,7) = 2rir + 4n?|€|?. 
a. There exist C, R > 0 such that |é| |(€,7)|1/2 < C|P(E,7)| for |(€,7)| > R. 
(Consider the regions |r| < |é|? and |r| > |€|? separately.) 
b. If f € H, and (ô — A)f € Hs, then f € Hs+ı and ôs; f € As+(1/2) for 
1<i<n. 
c. If ¢ € CS°(R"*'), we have 


[a — A, Cf = (G6 — Ag) f — 25° (Ge,6) (Oxf). 


d. If Q is open in R™*+!, u € D’(Q), and (0, — A)u € H!°°(Q), then u € 
H'°°, (Q). (Let Y; be as in the proof of Theorem 9.26. Show inductively that 
if you € Ho, then yju E€ Ao4(j/2) and ðr; (pju) E Ho+4(j~1)/2 provided 
o+ł4j <s.) 
38. Suppose so < sı and to < tı, and forO < A < 1 let 
sa = (1—A)so + As1, ta = (1 —A)to + àt. 


If T is a bounded linear map from Hs, to Ht, whose restriction to Hs, is bounded 
from Hs, to H;,, then the restriction of T' to Hs, is bounded from Hs, to H+, for 
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0 < \ <1. (T is bounded from H, to H; iff A,TA_; is bounded on L?. Observe 
that A, is well defined for all z € C and A, is unitary on every H, if Rez = 0. Let 
s(z) = (1 — z)so + 281, t(z) = (1 — z)to + 2t, and forO < Rez < landġ, y E S 
let F(z) = f[AezyTA_s(z)¢]. Apply the three lines lemma as in the proof of the 
Riesz-Thorin theorem.) 


39. Let Q be an open set in R”, and let G : Q — R” be a C% diffeomorphism. 
For any ¢ € CS°(G(Q)), the map Tf = (¢f) o G is bounded on H, for all s; 
consequently, f o G € H!°°(Q) whenever f € H!°°(G(Q)). Proceed as follows: 
a. If s = 0,1,2,..., use the chain rule and the fact that f € H, iff 3e f € L? 
for |a| < s. 
b. Use Exercise 38 to obtain the result for all s > 0. 
c. For s < 0, use Proposition 9.16 and the fact that the transpose of T is another 
operator of the same type, namely, Tf = (wf) o H where H = G7! and 
w = (Jo) o G with J (x) = | det Dz G|. 


40. State and prove analogues of the results in this section for the periodic Sobolev 
spaces 


H,(T”) = {f € DT"): DO (1 + [KP IRE) < oof. 


9.4 NOTES AND REFERENCES 


The mathematical foundations for the theory of distributions were largely laid in 
the 1930s. On the one hand, several researchers in partial differential equations 
arrived at the notion of “weak derivatives” of functions; to wit, if f,g € Li,.(U), 
g is the derivative O° f in the weak sense if f gọ = (—1)lel f fO°%¢ for all ġ in 
some suitable space of test functions. On the other, various attempts were made to 
extend the domain of the Fourier transform beyond L! + L?. The idea of defining 
“generalized functions” as linear functionals on certain function spaces goes back to 
Sobolev [136], but it was Laurent Schwartz who systematically developed the theory 
of distributions and who introduced the spaces S and S’ as natural domains for the 
Fourier transform. (See Dieudonné [33] for a more detailed historical account.) 

Rudin [126] contains a good concise introduction to the theory of distributions that 
includes some functional-analytic points we have elided, such as the definition of the 
topology on C° (U) and the properties of convolution on D’ x €’. More extensive 
treatments can be found in Gelfand and Shilov [55], Schwartz [132], and Treves [150]. 
H6rmander [77] contains an excellent full-scale treatment of distribution theory with 
a view toward its applications to differential equations. 


89.2: See Folland [49] for a case study of the analytic techniques used in manip- 
ulating distributions and their Fourier transforms and applying them to differential 
equations. 


§9.3: The spaces originally considered by Sobolev [137] are the spaces HẸ of 
functions f € LP whose distribution derivatives O° f are in LP for |a| < k. When 
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1 < p < œ, it turns out that H? = {f : Asf € LP} (although this is far from 
obvious when p # 2), and this characterization of H? can be used to define H? for 
all s € R. Sobolev’s embedding theorem, in this setting, is that if s < n/p then 
H? C L4 where q7} = p-! — n“!s, and if s > k + n/p then H? CC®. See Stein 
[140, 8§V.2-3]. Further results on Sobolev space and their applications can be found 
in Adams [1] and Lieb and Loss [93]. 

In Rellich’s theorem the hypothesis that K is compact can be replaced by the 
hypothesis that m( K) < oo when s > 0; see Lair [89]. 

A differential operator L = DT aa D| with C° coefficients is called elliptic 
on Q C R” if do gjam Ga(z)E“ # 0 for all z € Q and all nonzero £ € R”. The 
elliptic regularity theorem remains true as stated for such operators; see Folland [48]. 
The L? version of this theorem is also valid for 1 < p < o, but not for p = 1 or 
p = oo. Itis not true, except in dimension 1, that if Lu € C*(Q) then u € C*t™(Q), 
but if ©% Lu is not just continuous but Hölder continuous of exponent A (0 < A < 1) 
for |a| = k, then u € C**+™(Q) and 3u satisfies the same Holder condition for 
|6| = k + m. See Taylor [147, Chapter XI]. 
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Topics in Probability 
Theory 


Probability theory, originally conceived to analyze games of chance, has developed 
into a broadly useful discipline with deep connections to other branches of mathemat- 
ics and many applications to other subjects such as physics, statistics, and economics. 
The mathematical study of probability began some two and a half centuries ago, but 
until the advent of modern analytic tools the theory was limited to combinatorial the- 
orems involving discrete sample spaces and a few other results of somewhat doubtful 
rigor. It is now recognized that the fundamental datum in probability theory is a 
measure space (X, M, p) such that p(X) = 1; such a measure p is called a proba- 
bility measure. X is to be considered as the set of all possible outcomes of some 
process, such as an experiment or a gambling game, and the measure of a set & € M 
is interpreted as the probability that the outcome lies in &. Although measure spaces 
are the natural setting for the study of probability, it is hardly accurate to say that 
probability theory is a branch of measure theory, for its central ideas and many of its 
techniques are distinctively its own. 

This brief chapter is intended not as a systematic introduction to probability theory 
but rather as an advertisement for the subject; it also serves to illustrate further some 
results of previous chapters. 


10.1 BASIC CONCEPTS 


Probability theory has its own vocabulary, which is partly a legacy of its development 
before the connection with measure theory was made explicit and partly a result of the 
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fact that the probabilistic point of view is different. We therefore begin by presenting 
a brief dictionary of probabilists’ dialect. 


Analysts’ Term Probabilists’ Term 

Measure space (X, M, u) (u(X) = 1) Sample space (Q, B, P) 
(o-)algebra (o-)field 

Measurable set Event 

Measurable real-valued function f Random variable X 

Integral of f, f f dp Expectation or mean of X, E(X) 
L? [as adjective] Having finite pth moment 
Convergence in measure Convergence in probability 
Almost every(where), a.e. Almost sure(ly), a.s. 

Borel probability measure on R Distribution 

Fourier transform of a measure Characteristic function of a distribution 
Characteristic function Indicator function 


Probabilists have an aversion to displaying the arguments of random variables. 
For example, {w : X(w) > a} and P({w : X(w) > a}) are commonly written as 
{X > a}and P(X >a). 

Henceforth we shall, for the most part, adopt probabilistic language in this chapter, 
although we shall use the term “LP random variable” in preference to the more 
cumbersome “random variable with finite pth moment.” One more standard piece of 
terminology, which has no equivalent in classical analysis, is the following: If X is a 
random variable, its variance o°( X) and standard deviation o(X ) are defined by 


o?(X)= inf E[(X—a)?],  o(X) = Vo?(X). 


If X ¢ L?,thena?(X) = œ. If X € L*, then E[(X—a)*] = E(X?)—-2aE(X)+a? 
is a quadratic function of a whose minimum occurs when a = E(X); hence 


o7(X) = E[(X - E(X))*] = E(X?) - E(X? (X EL’). 


o(X) is a measure of how widely X deviates from its mean E(X). 

At this point we must discuss a general measure-theoretic construction. Let 
(Q, B, P) be a probability space (or, for that matter, an arbitrary measure space), let 
(Q, B’) be another measurable space, and let ¢ : Q — Q’ be a (B, B’)-measurable 
map. Then the measure P induces an image measure Pp on Q’ by 


P(E) = P($™ (E)). 


That this is indeed a measure follows from the fact that 6~! commutes with unions 
and intersections. 


10.1 Proposition. With notation as above, if f : Q — R is a measurable function, 
then fy f dPs = J4(f ° $) dP whenever either side is defined. 


Proof. When f = xg with E e BY’, this is just the definition of Py, since 
XE © P = Xg-1(z). The general result follows by taking linear combinations and 
limits. E 
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If X is a random variable on Q, then Px is a probability measure on R, called the 
distribution of X , and the function 


F(t) = Px ((-00, t]) = P(X < t) 


(which determines Px by Theorem 1.16) is called the distribution function of X. 
If {Xa}aea is a family of random variables such that Px, = Px, forall a, € A, 
the X,,’s are said to be identically distributed. 

More generally, for any finite sequence X1,..., Xn of random variables, we can 
consider (X1,...,Xn) as a map from Q to R”, and the measure P; x,,..., xn) On 
IR” is called the joint distribution of X1,..., Xn. It is a general principle that all 
properties of random variables that are relevant to probability theory can be expressed 
in terms of their joint distributions. For example, by Proposition 10.1, 


B(X) = J HRO os / (t — E(X))? dPx (t), 
E(X +Y) = / (t+s)dP.xyy(t, 8). 


In fact, given a Borel probability measure A on R, one can simply speak of the 
mean À and variance o° of À, 


X= [earl a = f(t- XP arly, 


which are the mean and variance of any random variable with distribution A. 

One of the most important concepts in probability theory, and the one that most 
clearly sets it apart from general measure theory, is that of (stochastic) independence. 
To motivate this idea, consider a probability space (Q, F, P) and an event E such that 
P(E) > 0. Then the set function Pg (F) = P(ENF)/ P(E) is a probability measure 
on Q called the conditional probability on £; Pz(F’) represents the probability of the 
event F given that E occurs. If Pe(F) = P(F), that is, if the probability of F' is the 
same whether or not we restrict to Æ, then F is said to be independent of E. Thus, 
F is independent of E iff P(E N F) = P(E) P(F); moreover, the latter condition is 
clearly symmetric in Æ and F and makes sense even if P(E) = 0. 

With this in mind, we define a collection {Ea }ac a of events in Q to be indepen- 
dent if 


n 


P( Bar eee Be.) = [PEx ) for all n € N and all distinct a; ...,@n € A. 
1 


(For the events Ea to be independent it does not suffice for them to be pairwise 
independent; see Exercise 1.) 

A collection {Xa}aea of random variables on Q is called independent if the 
events {Xa € Ba} = X71(B.) are independent for all Borel sets Ba C R. This 
condition can be neatly rephrased as follows. Observe that if œ&1,...,@n € A and 
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we write Xj = Xq,, we have 


PX BOX. (By) = P((X1,---, ha) (B1 x -x Ba)) 


= Px, PAN Xn) (Bı xX- X Bn), 
whereas 
[[ P(x7"B)) = [[ Px, (2) = (I Px,) (By x +++ x Bn) 
1 1 1 


That is, {Xa }ac a is an independent set of random variables iff the joint distribution 
of any finite set of X q’s is the product of their individual distributions. 

The following proposition expresses the fact that functions of independent random 
variables are independent. 


10.2 Proposition. Let {Xn;j:1 <j < J(n), 1 < n < N} be independent random 
variables, and let fr: RY“) — R be Borel measurable fori < n < N. Then the 
random variables Yn = fn(Xn1,--- Aua 1 < n < N, are independent. 


Proof. Let Xn = (Xn1,-..,XnJn))- If Bi,...,By are Borel subsets of R, 
we have Y, !(Bn) = X3! (fz 1(Bn)) and hence 


n 


N 


(Viney YN) (Birk xX By) = (By) 


= (Xı,. da Xn) (fT (B1) X». X fR (Bn)). 


Therefore, by the independence of the Xn;’s and Fubini’s theorem, 


ore y,)(Br X +++ x Bn) = Pixi, xn) (fp (Bi) x +++ x fy’ (By)) 
N J(n) 


= (TI I Px UT Bo) x x fx" (By)) 


1 j=l 
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We now present some fundamental properties of independent random variables. 
For the first one we need the notion of convolutions of measures on R developed in 
§8.6. An easy induction on (8.47) shows that if A,,..., An E M(R), then A; *---*A,, 
is given by 


(10.3) Aves An(E) = fio | xolti + + taddati) dnt). 


10.4 Proposition. If {X;}? are independent random variables, then 
PX acd = Py Ke OK Px... 


Proof. Let A(ti,...,tn) = do) tj. Then Xi +--+ Xn = A(X,..., Xn), so 


N 
Pratt Xe = (Pio kad)a = (TPs) 
1 


and by (10.3), the last expression equals P, «--- * Py. E 


10.5 Proposition. Suppose that {Xj}? are independent random variables. If X; € 
L’ for all j, then ||; X; € Lt, and E([]} Xj) = [J] E(X;). 


Proof. We have []7 |X;| = f(X1,.--,Xn) where f(ti,...,tn) = [[] lt;l. 
Hence 


B(T) = | FAP 4x0 = J sa(TI Px.) 
= J] f its 4Px, es) = T2050. 


This proves the first assertion, and once this is known, the same argument (with the 
absolute values removed) proves the second one. E 


10.6 Corollary. If {X;}? are independent and in L?, then o?(X, +--+ Xn) = 
n 2 
1 0 (X35). 


Proof. Let Y; = X; — E(X,;). Then {Y;}} are independent and have mean 
zero, SO 


B(Y;Yx) = EY) B(Y%i) =0 (J #k). 


Therefore, 


o?(Xy +--+ Xn) = E((Yi +++ + Yn)?) =D E(VGYe) 
j,k 


= > EYP) = 2 P) 
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These results show that independence is a very stringent property. For one thing, 
it is usually not the case that the product of two L+ functions is in L}. For another, 
suppose that X and Y are independent and E(X) = 0. Then for any Borel measurable 
function f on R such that f o Y € L! we have 


E(X -(foY)) =E(X)E(foY) =0. 


In other words, X is orthogonal (in the L? sense) to every function of Y. This 
indicates that, for example, if one tries to construct a sequence of independent 
random variables on [0, 1] with Lebesgue measure by using the familiar functions of 
calculus, one will probably not succeed. (Perhaps the simplest example is Xn (x£) = 
the mth digit in the decimal expansion of x; see Exercise 23.) Rather, the natural 
setting for independence is product spaces. 

Indeed, suppose 


=i Ke Ky, BH By ee @ Ba, Pp Ka x Be 


Then any random variables X1,...,Xn on Q such that X; depends only on the jth 
coordinate are independent, for if X; = f; 07; where m; : Q — QQ; is the coordinate 


map, 
n 


(Xren) (By Ree KB) = |] E 


and hence 


The same idea can be made to work for infinitely many factors; see $10.4. 
As an application of these ideas, we present Bernstein’s constructive proof of the 
Weierstrass approximation theorem. 


10.7 Theorem. Given f € C((0,1)), let 
— . n! k n—k 
B,(x) = SC re (nage: 
Then Bn — f uniformly on [0,1] as n > oo. 


Proof. Given z € (0, 1], let A = 26, + (1 — x)6o where 6; is the point mass at t. 
Let Q = R”, P = å x- x A, and Xj = the jth coordinate function on R”. Then 
Xj ,...,Xn are independent and have the common distribution A. It is easy to check 
(Exercise 7) that 


L n! k n—k 
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and hence, in view of Proposition 10.4, 
p EE a 
B, (x) = E (==) 
n 
Now, Joo n![k!(n — k)!]~tx*(1 — x)"—* = 1 by the binomial theorem, so 


(10.8) |f(x)- <D f(k/n) Cer aie =)", 


Given € > 0, by the uniform continuity of f on [0, 1] there exists 6 > 0, independent 
of x and y, such that | f(x) — f(y)| < € whenever |x — y| < 6. The sum of the terms 
in (10.8) such that |x — (k/n)| < 6 is at most €, while the sum of the remaining terms 


is at most x x 
Afla P (2 -z| > 5). 


But 


so by Corollary 10.6, 


Sa 2 n 1 
p (Z a)? = 0 (2 ui 
n n n n 
and Chebyshev’s inequality therefore gives 
2| F llu 
f(z) - Bala)| < e + He, 
which is less than 2¢ provided n is sufficiently large. E 


Exercises 


1. Let Q consist of four points, each with probability Ł, Find three events that are 
pairwise independent but not independent. Generalize. 


2. Let{X;} be a sequence of independent identically distributed positive random 
variables such that E(X;) = a < co and E(1/X,;) =b < œ, and let Sn = D0) X 
E(Xj/Sn) =1/nif ji <n, and B(X;/S,) = aE(1/S;) if j >n. 
b. E(Sm/Sn) = m/nifm < nand E(Sm/Sn) = 1+ (m — njaE(1/Sn) if 


m>n. 


3. Suppose that { Ea taca is a collection of events in Q. 
a. If Ea,- -, Ean are independent, so are Ea,- - , Han_1,6G,,- 
b. If {Ea} is an independent set, so is {Fa}, where each Fa is either Ea or ES. 
c. {Ea} is an independent set of events iff {yz, } is an independent set of 
random variables. 
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4. Let X,Y,Z be positive independent random variables with a common distribu- 
tion A, and let F(t) = A((0,t]). The probability that the polynomial Xt? + Yt + Z 
has real roots is ie le F(t? /4s) dX(t) dX(s). 


5. If X is a random variable with distribution dPx(t) = f(t) dt where f(t) = 
f (—t), then the distribution of X? is dPx2(t) = t71? f(t!/?)x(0,00)(t) dt. 


6. For a,u > 0, let dyau(t) = [I (a) tutt? te" x(0,00) (t) dt, the gamma 
distribution with parameters a and u. 
a. The mean and variance of Yau are a/u and a/u?, respectively. 
b. Ya,u * Ybu = Yatb,u- (Use Exercise 60 in §2.6.) 
c. If X1,...,Xn are independent and all have the distribution dPx(t) = 
(27) 1/2e7t*/2 dt, then X? + --- + X2 has the distribution Yn/2,1/2: (Use 
(b) and Exercise 5. Yn/2,1/2 is called the chi-square distribution with n de- 
grees of freedom.) 


7. Leté, denote the point mass att € R. Given0 < p < 1, let 6p = pd, + (1—p)ôo, 
and let 35” be the nth convolution power of 6p. Then 


and the mean and variance of 6%” are np and np(1 — p). 85” is called the binomial 
distribution on {0,...,7} with parameter p. 


8. Let 5, denote the point mass att € R. Given a > 0, let Aa = e° Xo (a*/k!) 6x, 
the Poisson distribution with parameter a. 
a. The mean and variance of A, are both equal to a. 
b. Aa * Xp = Aapo: 
c. The binomial distribution Dalm (Exercise 7) converges vaguely to Ag as n —> 
oo. (Use Proposition 7.19.) 


9. Suppose that {Xn}? is a sequence of random variables. If X, — X in 
probability, then Py, — Px vaguely. (Use Proposition 7.19.) 


10. (The Moment Convergence Theorem) Let X1, X2,...,X be random variables 
such that Px, — Px vaguely and sup, E(|Xn|") < co, where r > 0. Then 
E(|Xn|*%) > E(|X|%) for all s € (0,7), and if also s € N, then E(X5) > E(X"). 
(By Chebyshev’s inequality, if € > 0, there exists a > 0 such that P(|X,,| >a) < € 
for all m. Consider f (t)|t|§ dPx,, (t) and [{1—(t)]|t|* dPx, (t) where 6 € C.(R) 


and ¢(t) = 1 for |t| < a.) 


10.2 THE LAW OF LARGE NUMBERS 


If one plays a gambling game many times, one’s average winnings or losses per 
game should be roughly the the expected winnings or losses in each individual game; 
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more generally, if one plays a sequence of possibly different games, one’s average 
winnings or losses should be roughly the average of the expected winnings or losses 
in the individual games. In symbols: If {X; }?° is a sequence of independent random 
variables and E(X;) = pj, then the average n~! -7 X; should be close to the 
constant n~1 $>} pj when n is large. 

The law of large number is a precise formulation of this idea. It comes in several 
versions, depending on the hypotheses one wishes to make. The first version, with 
the weakest hypotheses and conclusions, has a very simple proof. 


10.9 The Weak Law of Large Numbers. Let {X,;}[° be a sequence of indepen- 
dent L? random variables with means {uj} and variances {07}. Ifn—* Xi o? — 0 
as n — œ, then n™! X`] (X; — uj) > 0 in probability as n — ov. 


Proof. n~t] (X; — uj) has mean 0 and variance n~? )*y o? (the latter by 
Corollary 10.6). Hence by Chebyshev’s inequality, for any € > 0 we have 


n 


P(|n SX; — ni) > e) < (ne)~? So? — Jas n —> oo. 
1 


1 
a 


Under slightly stronger hypotheses, we can obtain the sharper conclusion that 
nt $7 (X; — u3) — 0 almost surely. To establish this, we need the following two 
lemmas, which are of interest in their own right. 


10.10 The Borel-Cantelli Lemma. Let {A,,}§° be a sequence of events. 
a. If 05° P(An) < œ, then P(limsup An) = 0. 
b. If the An’s are independent and XX P( An) = œ, then P(limsup An) = 1. 


Proof. We recall that lim sup An = Mke Uz; An, so that 


P(lim sup An) < Pi y An) < 3 P(An), 
n=k 


n=k 


and the latter sum tends to zero as k — co if $` P(An) converges. On the other 
hand, suppose that $` P( An) diverges and the A,,’s are independent. We must show 
that 


P((limsup An)°) = e(U A Ag) =0, 


and for this it is enough to show that P(N, AS) = 0 for all k. But the A£ ’s are 


independent (Exercise 3), so since 1 — t < e~*, 


K K K K 
P(N 48) = T[i - P(n) < [J] 674 = exp(— > P(An)). 
n=k k k k 


The last expression tends to zero as K — oo, which yields the desired result. E 
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10.11 Kolmogorov’s Inequality. Let X1,..., Xn be independent random variables 
with mean 0 and variances Oe, ...,02, and let Sy = Xi +---+X,. Foranye > 0, 


n 
seee a 
P( mas IS 2 <) Se? Dak 


Proof. Let A, be the set where |S;| < € for 7 < k and |S;,| > e€. Then the Ax’s 
are disjoint and their union is the set where max |S;| > €, so 


P(max |S;| 2 €) = 5 P(A) < e? 55 E(xa S2), 
1 1 
because Se > € on Ax. On the other hand, 
E(S;) 2 S` B(x, 82) 
1 
= YB (xa, [S? + 25;(Sn — Sk) + (Sn — Sx)]) 
1 
> S Exa S?) E 25 OANE — Sx). 
1 1 
It will suffice to show that F(v.4,Sx(Sn — Sk)) = 0 for all k, for then we have 


P(max |S| > €) < €~?E(S2) = Dal 


by Corollary 10.6, since the X;,’s have mean zero. But x4, is a measurable function 
of Sy,..., Sp and hence of X1,..., Xk, whereas Sn — Sk is a measurable function 
of X441,-..,Xn. Moreover, E(.S;,) = E(X,;) = 0 for all k. Therefore, by 
Propositions 10.2 and 10.5, 


E(x, Sk(Sn — Sk)) = E(x, Sk) E(Sn — Sk) = E(xa, Sk) +0 = 0. 
E 


10.12 Kolmogorov’s Strong Law of Large Numbers. If {Xn}? is a sequence of 
independent LŽ random variables with means {un} and variances {02} such that 
yp n- 202 < œ, then n™! Y (X; — uj) — 0 almost surely as n — ov. 


Proof. Let Sn = $`} (X; — uj). Given € > 0, for k € N let Ax be the set 
where n~!|S;,| > € for on n such that 2*-1 < n < 2*. Then on Ax we have 
|S, | > €2*—1 for some n < 2*, so by Kolmogorov’s inequality, 


P(Ax) < < (e27 1) Det 
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Therefore, 


so P(limsup A;) = 0 by the Borel-Cantelli lemma. But lim sup Ax, is precisely the 
set where n~1|S,,| > e for infinitely many n, so 


P(limsup n7*|S,| < €) =1. 


Letting € — 0 through a countable sequence of values, we conclude that n7! Sp — 0 
almost surely. a 


The hypotheses of this theorem are a bit stronger than those of the weak law 
(Exercise 11). They are certainly satisfied when the X,,’s are identically distributed 
L? random variables, since then o2 is independent of n. However, in the identically 
distributed case the assumption that X„ € L? can be weakened. 


10.13 Khinchine’s Strong Law of Large Numbers. If {Xn} is a sequence 
of independent identically distributed L! random variables with mean u, then 
n~t So) X; —> u almost surely as n — oœ. 


Proof. Replacing Xn by Xn — u, we may assume that p = 0. Let À be the 
common distribution of the X;’s; we are thus assuming that 


/ aN, / tdX(t) = 


Let Y; = X; on the set where |X| < j and Y; = 0 elsewhere. Then 


S PUG # Xi) = OPK >) =O 
1 1 1 
= 


A({t: itl > 3}) 


SoA({tik < |t| <k+1}). 
j=l k=j 


Q. 


Since ren r=; = Dekel Day interchanging the order of summation yields 


SO P(X] #Y;) =Y kA({t: k< i| < k+1}) < f Iae) < œ 
1 k=1 
By the Borel-Cantelli lemma, then, with probability one we have X; = Y; for j 
sufficiently large, and it therefore suffices to show that n~! X7 Yj — 0 almost 
surely. 

We have 


o? (Yn 0 fo t? dX(t), 
(Yn) < EYB) a (t) 
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and hence 


mf t? dX(t) 
j-\<|t]<j 


in? | | d(C). 
1<|t|<j 


Reversing the order of summation again and using the fact that ae n7? < 247! 
(by comparison to Í; x7? dx), we obtain 


$. noa) o?( S250 | 


By Theorem 10.12, therefore, if 4; = E(Y;) we have n~! S77 (Y; — pj) — 0 almost 
surely. However, by the dominated convergence theorem, 


=i -F > 
hj R | tao 0, 


it] d\(t) = a It] dA(t) < 


1l<|t|<j 


and it follows easily (Exercise 12) that n~? 5°) p; — Oalso. Hence n~t! $7 Y; — 0 
a.s., and the proof is complete. E 


Thus far we have not shown how to construct sequences of random variables that 
satisfy the hypotheses of the theorems in this section. We shall do so in §10.4. 


Exercises 

11. If Dr n?o? < oo, then limpo n? 07 g? = 0. (Estimate es a? and 
D 4 separately.) 

12. If {an} C Cand liman = a, then limn7' Y`} aj =a. 


13. The weak law of large numbers remains valid if the P of independence 
is replaced by the (much weaker) hypothesis that E[(X; — u;)(X~ — uk)| = 0 for 


jF k. 
14. If {Xn } is a sequence of independent random variables such that E (Xn) = 0 and 


a o? (Xn) < œ, then > X n converges almost surely. (Apply Kolmogorov’s 
inequality to show that the partial sums are Cauchy a.s.) Corollary: If the plus and 
minus signs in $7 +n! are determined by successive tosses of a fair coin, the 


resulting series converges almost surely. 


15. If {Xn} is a sequence of independent identically distributed random variables 
that are not in L}, then limsup,,,.,n7"|}°7 X;| = œ almost surely. (Let 
An = {|Xn| > n}. Using an idea from the proof of Theorem 10.13, show that 
57 P(An) = œ, and apply the Borel-Cantelli lemma.) 


16. (Shannon’s Theorem) Let {X; } be a sequence of independent random variables 
on the sample space Q having the common distribution A = $; p;6; where 0 < 
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pj < 1, $`} p; = 1, and 6; is the point mass at j. Define random variables Yj, Yo,... 
on Q by 
Y,(w) = P({w' : Xi(w’) = Xi(w) for 1 <i < n}). 


a. Y„ = |]; px,. (The notation is peculiar but correct: X;(-) € {1,...,r}as., 
so px, is well-defined a.s.) 

b. limpnson tlogY, = X`; P; log p; almost surely. (In information the- 
ory, the X;’s are considered as the output of a source of digital signals, and 
— ``} p; log p; is called the entropy of the signal.) 


17. A collection or “population” of N objects (such as mice, grains of sand, etc.) 
may be considered as a smaple space in which each object has probability Nt. 
Let X be a random variable on this space (a numerical characteristic of the objects 
such as mass, diameter, etc.) with mean p and variance o*. In statistics one is 
interested in determining pz and o? by taking a sequence of random samples from 
the population and measuring X for each sample, thus obtaining a sequence {X}; } of 
numbers that are values of independent random variables with the same distribution 
as X. The nth sample mean is M, = n~t X`} X; and the nth sample variance 
is S2 = (n — 1) 0} (X; — M;)?. Show that E(M,) = u, E(S2) = o?, and 
Mn — p and $2 — ø? almost surely as n — oo. Can you see why one uses 
(n —1)—! instead of n~! in the definition of S2? 


10.3 THE CENTRAL LIMIT THEOREM 


Suppose u € Rando > 0. By Proposition 2.53 and some elementary calculus, the 
measure ve on R defined by 


2 1 2 
duv” (t) = —— e-e) /20 dt 
a ( ) ONV 27 


is a probability measure that satisfies 
o? A 27 oA a2 
fiaz (t) =p, fe-m dv, (t) =o". 


It is called the normal or Gaussian distribution with mean pz and variance o?. The 
special case vd is called the standard normal distribution. 

It is a matter of empirical observation that normal and approximately normal dis- 
tributions are extremely common in applied probability and statistics. The theoretical 
explanation for this phenomenon is the central limit theorem, the idea of which is 
as follows. Suppose that {X;} is a sequence of independent identically distributed 
random variables with mean 0 and variance 0”. Then n~t 5>} X; has mean 0 and 
variance n~ ‘a, so there is a high probability that it is close to 0 when n is large; this 
is the content of the weak law of large numbers. On the other hand, n7 1/2 a X F 
has mean 0 and variance g? for all n, so one might ask if its distribution approaches 


some nontrivial limit as n — oo. The remarkable answer is that no matter what the 
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distribution of the X,’s is, this limit exists and equals the normal distribution with 


mean 0 and variance g2. 


The central limit theorem is really a theorem in Fourier analysis. We shall state it 
as such and then translate it into probablity theory. 


10.14 Theorem. Let À be a Borel probability measure on R such that 


[? dA(t)=1, [raw = 0. 


(The finiteness of the first integral implies the existence of the second.) Forn € N let 
A*” = Ax- x À (n factors) and define the measure Xn by An( E) = A*"(./n E), 
where n E = {,/nt:t € E}. Then Xn — vl vaguely as n —> ov. 


Proof. The hypotheses on the measure A imply that its Fourier transform AE )= 
f e776 d)(x) is of class C° and satisfies \(0) = 1, A’(0) = 0, and A” (0) = —47?. 
(Differentiate the integral twice as in Theorem 8.22d.) Thus by Taylor’s theorem, 


AE) = 1] — e+ o(€7), 


where o(a) denotes a quantity that satisfies a~'o(a) — 0 as a — 0. Moreover, 


~ ma 


) 
(A*”) = (A)”, so by the obvious change of variable, 
ve = n 2r E? €? 
= -1/2 alie eh == 
o= five ira = h-E +o) 


Thus, since log(1 + z) = z + o(z), 


n 


log An(£) = n log f — L + o(£)| = 2 E? +n. i: 


which tends to —27r?¢£? as n — oo. In other words, An (Ê) — e7?T E asn — oo for 
all €, so the conclusion follows from Propositions 8.24 and 8.50. E 


10.15 The Central Limit Theorem. Let {X;} be a sequence of independent iden- 
tically distributed L? random variables with mean u and variance o?. As n — œ, 
the distribution of (o./n)~* 9-5 (X; — u) converges vaguely to the standard normal 
distribution v4, and for alla € R, 


La 1 j 2 
. —— \ ` = < Z —t*/2 
a P= 7 (Xn p) — a) oT [. í w 


Proof. Replacing X; by o7} (X; — p), we may assume that u = 0 and o = 1. If 
A is the common distribution of the X;’s, then À satisfies the hypotheses of Theorem 
10.14, and in the notation used there, Àn is the distribution of n~1/? $f Xj. The 
first assertion thus follows immediately, and the second one is equivalent to it by 
Proposition 7.19. p 
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As the reader may readily verify, the same argument yields the following more 
general result. Under the hypotheses of the central limit theorem, if {Kn} is any 
sequence of finite subsets of N such that kn = card( Kn) — oo, then the distribution 


of (oVkn)~" jeg, (Xj — u) converges vaguely to vå. 


Exercises 


18. A fair coin is tossed 10,000 times; let X be the number of times it comes up 
heads. Use the central limit theorem and a table of values (printed or electronic) of 
erf(£) = 2771/2 fF e-*” dt to estimate 

a. the probability that 4950 < X < 5050; 

b. the number k such that |X — 5000| < k with probability 0.98. 


19. If {X,;} satisfies the hypotheses of the central limit theorem, the sequence 
Yn = (oyn) t $1 (X; — u) does not converge in probability. (Use the remarks 
following the central limit theorem to show that { Y>» } is not Cauchy in probability.) 


20. If {X,} is a sequence of independent identically distributed random variables 
with mean 0 and variance 1, the distributions of 


DLE) and D/D 
1 1 1 1 


both converge vaguely to the standard normal distribution. 


21. Let {Xn} be a sequence of independent random variables, each having the 
Poisson distribution with mean 1 (Exercise 8). Let 


$ Sn —n\— n — Sn 
Si = ~ Yn = (=) = max (= 0) i 


k n+(1/2) n -n 
a. E Soe 3 her 4 = = sn Sah (For the first equation, use 
Exercise 8b. As fo the second, the sum aca ) 
b. Py., converges vaguely to 560 + X (0,00): (Use Proposition 7.19.) 
ce ft Sh (Ye) : 1 for all n. 
d. E(Y,, = tdPy (t) — J t dv} (t) = (27). (Use Exercise 10.) 
ae this with (a), fed obtains ed formula: 
n! 
Eu aee oann 7 
22. In this exercise we consider random variables with values in the circle T, regarded 
as {z € C : |z| = 1}. The distribution of such a random variable is a measure on T. 
a. If X1,..., Xn are independent, then Px, x,...x, = Px, *-:: * Px,. 
b. If {X;} is a sequence of independent oes variables with a common 
distribution A, the distribution of IL X; converges vaguely to the uniform 
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distribution on T (= arc length over 27) unless A is supported on a finite subgroup 
ie — {ermi ™ :0 <j < my}ofT, in which case it converges to the uniform 
distribution m~! $ ez, z on Zm. (Use Exercise 8.39.) 


10.4 CONSTRUCTION OF SAMPLE SPACES 


The preceding two sections have dealt with sequences of random variables whose 
joint distributions have certain properties. We now address the question of finding 
examples of such sequences, and more generally of constructing families {Xa }ac A 
of random variables indexed by an arbitrary set A whose finite subfamilies have 
prescribed joint distributions. 

If the index set A is finite, this is easy: given any Borel probability measure P on 
R”, P is by definition the joint distribution of the coordinate functions X1,..., Xn 
on the space (R”, Bgn, P). If A is infinite, however, the problem is more delicate. 
Suppose to begin with that {Xa }ac a is a family of random variables on some sample 
space (Q, B, P), and for each ordered n-tuple (a1,..., Œn ) of distinct elements of A 
(n € N) let Poa,,....a,) be the joint distribution of Xq,,..-, Xan- Then the measures 
Piai,...,an) Satisfy the following consistency conditions: 


date If o is a permutation of {1, ... , n}, then 
i OE azaynaa o EEEN Le(n)) = oh E E Ess lan) 
If k < nand E € Bre, then 
(10.17) 


Fansa E) a aie 2, x R”~*), 


Conversely, given any family of measures Fiai, an) Satisfying (10.16) and 
(10.17), we shall show that there exist a sample space (Q, B, P) and random variables 
{Xa} on Q such that Pray,...,an) 18 the joint distribution of Xq,,...,Xa,- To do 
this, it is convenient to make one minor technical modification: We replace R by 
its one-point compactification R* = RU {oo}. Any Borel measure on R” can be 
regarded as a Borel measure on (R*)” that assigns measure zero to (R*)” \ R”, and 
vice versa. In other words, we allow our random variables to assume the value oo, 
although they will do so with probability zero. The point of this is that the space 
(IR*)4 is compact for any A, by Tychonoff’s thoerem. 

With this modification, the construction of the sample space (Q, B, P) in the case 
where the random variables Xa are independent, so that P should be the product of 
the Px’s, is contained in Theorem 7.28. The general case is achieved by a simple 
adaptation of the argument given there, which we review in detail for the convenience 
of the reader. 


10.18 Theorem. Let A be an arbitrary nonempty set, and suppose that for each 
ordered n-tuple of distinct elements of A (n € N) we are given a Borel probability 
measure Pioi, an) ON R”, or equivalently on (R*)”, satisfying (10.16) and (10.17). 
Then there is a unique Radon probability measure P on the compact Hausdorff space 
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Q = (R*)4 such that Pioi, an) iS the joint distribution of Xa,,---;Xan» where 
Xa : Q — R* is the ath coordinate function. 


Proof. Let Cr(Q) be the set of all f € C(Q) that depend only on finitely many 
coordinates. If f E€ Cr(Q), say f(r) = F(ra,,---La,,), let 


IN = | PaPa) 


I(f) is well defined because of (10.16) and (10.17): If we permute the variables 
or add some extra ones, the result is the same. Clearly /(f) > 0 if f > 0, and 
\T(f)| < || fllu with equality when f is constant. 

Now, C'r(Q) is clearly an algebra that separates points, contains constant func- 
tions, and is closed under complex conjugation, so by the Stone-Weierstrass theorem 
it is dense in C(Q). Hence, the functional J extends uniquely to a positive linear 
functional on C(Q) with norm 1, and the Riesz representation theorem therefore 
yields a unique Radon measure P on 2 such that I(f) = f f dP for all f € C(Q). 
Let Xa be the ath coordinate function on Q and let Py = be the joint distribu- 
tion of Xo,,---, Xa, on (R*)”. If F € C((R*)”) and f = Fo(Xa,,...,Xa,) as 
above, then 


(Cae and Fine) are both Radon measures by Theorem 7.8, so they 
are equal by the uniqueness of the Riesz representation. E 


The only property of R* used in this proof is that it is acompact Hausdorff space in 
which every open set is o-compact, so the theorem admits an obvious generalization. 
In particular, if for each æ there is a compact set Ka C R such that eee 1S 
supported in Ky, X +- x Ka, forallaj,...,@n, we could take Q = laei Ka and 
thus avoid introducing the point at infinity. 

Of special interest is the independent case, in which Pia, 
We state it as a corollary: 


an) = Par E X Pay: 


EEEE) 


10.19 Corollary. Suppose {Pa}aea is a family of probability measures on R. Then 
there exist a sample space (Q, B, P) and independent random variables {Xa aca 
on Q such that P, is the distribution of Xa for every a € A. Specifically, we can 
take N to be (IR*)4 and Xa to be the ath coordinate function; if Px is supported in 
the compact set Ka C R for each a, we can take 0 to be [I c4 Ka. 


Exercises 


23. Given b € N \ {1}, let B = {0,1,...,b — 1}, and let Po be the probability 
measure on B (or R) that assigns measure b~! to each point in B. Let P be the 
measure on Q = BN given by Corollary 10.19, where A = N and P, = Py for all 
n € N, and let {.X,,}$° be the coordinate functions on Q. 








330 TOPICS IN PROBABILITY THEORY 


a. Ajea C B, 


n 


P(()X77(4y)) =p" TI card(A,), 


1 


and P({w}) = 0 for allw E€ Q. 

b. Let Q = {w € Q : X,(w) Æ O for infinitely many n}. Then Q \ Q’ is 
countable and P(Q’) = 1. 

c. Define F : Q — [0,1] by F(w) = 07° Xn(w)b~” (so F(w) is the number 
such that {X,,(w)} is the sequence of digits in its base b decimal expansion). 
Then F'|Q’ is a bijection from (’ to (0, 1] that maps Bæ bijectively onto By 1. 
(Bœ is generated by sets of the form X71(A) N Q’. The image under F of 
such a Set is a finite union of intervals of the form (jb~”, kb~™], and these sets 
generate Byo 1.) 

d. The image measure of P under F is Lebesgue measure. 

e. (Borel’s Normal Number Theorem) A number z € (0, 1] is called normal 
in base b if the digits 0,1,...,6 — 1 occur with equal frequency in its base b 
decimal expansion, that is, if 


m card{m E {1,...,n}: Xm(w) =j} 


n— n 


1 
= 7 forj =0,1,...,6-1. 


Almost every x € (0, 1] (with respect to Lebesgue measure) is normal in base b 
for every b. 


10.5 THE WIENER PROCESS 


It is observed that small particles suspended in a fluid such as water or air undergo 
an irregular motion, known as Brownian motion, due to the collisions of the particles 
with the molecules of the fluid. A physical derivation of the statistical properties of 
Brownian motion was developed independently by Einstein and Smoluchowski, but 
the rigorous mathematical model for Brownian motion — in the limiting case where 
the motion is assumed to result from an infinite number of collisions with molecules 
of infinitesimal size — is due to Wiener. This model, called the Wiener process or 
Brownian motion process, has turned out to be of central importance in probability 
theory and its applications to physics and mathematical analysis. 

One can consider Brownian motion in any number of space dimensions. We shall 
describe the theory in dimension one and indicate how to generalize it. 

The position of a particle undergoing Brownian motion on the line at time t > 0 
is considered to be a random variable X; (on a sample space to be specified later) 
satisfying the following conditions. First, as a matter of normalization, we assume 
that the particle starts at the origin at time t = 0: 


(10.20) Xo = 0 (almost surely). 
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Second, since any given collision affects the particle by only an infinitesimal amount, 
it has no long-term effect, so the motion of the particle after time ¢ should depend on 
its position X; at that time but not on its previous history. Thus we assume: 


IFO < to<ti <e <tn, 


ore) the random variables Xt, — Xt, —ı (1 < j < n) are independent. 

Third, since the physical processes underlying Brownian motion are homogeneous 
in time, we shall postulate that the distribution of X; — X s depends only on t — s. If 
we divide the interval [s, t] into n equal subintervals [to, t1], . .. , [tn—1, tn] (to = s, 
tn = t) and write X; — X, = Ree. — X;,_,), it then follows from (10.21) that 
X: — X, is a sum of n independent identically distributed random variables. Since n 
can be taken arbitrarily large, the central limit theorem suggests that the distribution 
of X: — X, should be normal, and this conclusion is also supported by experimental 
evidence. Moreover, by Corollary 10.6, o?(X: — Xs) = no? (Xn — X), and it 
follows that 07(X; — Xs) = ro? (Xy — Xs) whenever t — s = r(t’ — s’) and r is 
rational; this strongly indicates that c?(X, — X,) should be proportional to t — s. 
Finally, since the particle is as likely to move to the left as to the nght, the mean of 
Xt — X s should be 0. Putting this all together, we are led to the third assumption: 


There is a constant C > 0 such that for 0 < s < t, X+ — X, has 
(10.22) se geste a te C(t—s) 
the normal distribution vg 


with mean 0 and variance C(t — s). 
The constant C, which expresses the rate of diffusion, is of course related to the 
physical parameters of the system. For simplicity, we shall henceforth take C = 1. 
A family {X:}+>0 of random variables satisfying (10.20)-(10.22), with C = 1, 
is called an abstract Wiener process. The generalization to n dimensions can 
now be described easily: an n-dimensional abstract Wiener process is a family 
of R” -valued random variables {Xz}:>0, where X; = (X},...,X/), such that (i) 
{X J }4>0 is a one-dimensional abstract Wiener process for each j, and (ii) if Y; is any 
function of the variables {X J heso forj7 =1,...,n, then Y;,..., Yn are independent. 
In other words, an n-dimensional abstract Wiener process is just a Cartesian product 
of n one-dimensional abstract Wiener processes. In particular, X; — X, has the 
n-dimensional normal distribution [v47 °]” for t > s: 





doi °] (21, awia) = [2m S exp (FS | dx, ... d£n, 


which has the appropriate sort of spherical symmetry. 

We return to the one-dimensional case. The conditions (10.20)-(10.22) completely 
determine the joint distributions of the X;’s as follows. If ti < --- < tn, then 
Xtis Xt. — Xt,,---, Xt, — Xt,,_, are independent (since Xo = 0 a.s.), so their joint 
distribution is the product measure 


= ee 
De Se caeD, p 
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But 
(Xis. oe Xe.) = T(X¢,, Xt, — A tis sey Xt, — Xt) 


where 

TY Yn) = (Vi Yi Fas 2209 y H a) 
Since det T = 1, Theorem 2.44 implies that the joint distribution P¢, .+,) Of 
Xt,,--+,Xt, is given by 


(10.23) 
EE E = dug” OO By = aay) ee dv" (x9 — xı) dys} (x1) 


n a n Le —~ La 2 
== IT] 27 (t; me tj-1)| k exp p Ge | dx dEi 


1 1 


geeey 


where to = Zo = 0. We thus know P} ...¢,, when tı < ++: < tn, and we obtain it 
in the general case by permuting the variables according to (10.16). Also, it follows 
easily from (10.23) that (10.17) is satisfied. Therefore, by Theorem 10.18, abstract 
Wiener processes exist. 

This situation leaves something to be desired, however. Physically, one expects 
the position of a particle to be a continuous function of time, so one would like the 
sample space for the Wiener process to be C ([0, 00), R) (or some subset thereof) 
and the random variable X; to be evaluation at t. Actually, Theorem 10.18 yields 
something along these lines: The sample space it provides is the space (IR*) 10,29) of 
all functions from (0, 00) into the compactified line, and X;(w) is indeed w(t). We 
can therefore achieve our goal by showing that the measure P of Theorem 10.18 is 
concentrated on C ([0, 00), R), considered as a subset of (R*)! °°). The resulting 
realization of the abstract Wiener process on C'((0, 00), R) is what is usually called 
the Wiener process. 

Henceforth we shall use the notation 


Q = (R*)), Qe = C([0, 00), R), 


and P will denote the Radon measure on Q whose finite-dimensional projections are 
given by (10.23). 

To begin with, we need to make a few comments about the role of the point 
at infinity. The function f(t,s) = |t — s| maps R° to [0, +00) (we write +00 to 
distinguish it from the point at infinity in R*), and we extend f to a map from 
(R*)? to [0,+00] by declaring that |t — oo| = |oo — t| = +00 for t € R and 
|oo — co| = 0. When thus extended, f is of course discontinuous at oo, but it is lower 
semicontinuous, as the reader may verify (Exercise 24). Thus for a, t, s € [0, 00) the 
sets {w E Q : |w(t)-—w(s)| > a} are open and the sets {w E Q : |w(t) —w(s)| < a} 
are closed in 22. 

Next, we need to make some estimates in terms of the quantity 


1/2 peo 
ple, 6) = sup / dvg(x) = sup[ <] / ent /2t dy, 
t<6 J |xr|>e t<6 mt € 
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These estimates are contained in the following four lemmas, after which we come to 
the main theorem. 


10.24 Lemma. For each € > 0, lims_s9 67! p(e, 6) = 0. 


Proof. We have 


oO ə OO Dt 5 
| e * [2t dr < / oe ea/2t dt = Es /2t 
—— € ’ 
€ € 


whence p(e, 8) < (2/€)(26/m)!/2e-©/5, The exponential term tends to zero faster 
than any power of ô, so the result follows. E 


10.25 Lemma. Suppose € > 0, ó >0,0 < tı <--- < tk, tk — tı < ô, and 
A= {w : |w(t;) — w(t1)| > 2€ for some j € {2,...,k}}. 
Then P(A) < 2p(€, 6). 
Proof. Forj =1,...,k, let 
B= {w : |\w(th) — w(t; )| > E}, 
Dj = {w : |w(t;) — w(tı)| > 2 and |w(t;) — w(t1)| < 2e fori < j}. 
Ifw € A, then w € D; for some j > 2; if w is not also in Bj, then w must be in Bj, 


for |w(tk) — w(t1)| > € whenever |w(t;) — w(t1)| > 2 and |w(t,) — w(t;)| < €. In 


other words, 
k 


Ac BU )(B; A D;). 
2 


But P(B;) < p(€,8) by (10.22), and B; and D; are independent events by (10.21); 
also, the D,;’s are clearly disjoint. Therefore, 


k k 
P(A) < P(Bi) + Ù P(B;)P(D;) < ple, ô) |1 + > P(D;)| < 2(6, 8). 
2 2 


10.26 Lemma. With the notation of Lemma 10.25, let 
E = {w : |w(t;) — w(t;)| > 4e for some i,j € {1,..., k} }. 
Then P(E) < 2p(€, 8). 


Proof. If |w(t;) — w(t;)| > 4e, we have either |w(t;) — w(t1)| > 2€ or |w(t;) — 
w(t,)| > 2e. Thus E C A, so the result follows from Lemma 10.25. E 
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10.27 Lemma. Suppose < > 0,0 <a < b, andb—a < ô. Let 
V = {w : |w(t) — w(s)| > 4e for some t,s € [a,b] }. 
Then P(V) < 2p(e, 6). 
Proof. If S is a finite subset of [a, b], let 
V(S) = {w : |w(t) — w(s)| > 4e for some t, s € S}. 


Then the sets V (S) are open in Q by the remarks preceding Lemma 10.24, and their 
union is V. Also, by Lemma 10.26, P(V(S)) < 2p(e, 6). Since the family {V (S) : 
S is a finite subset of [a, b] } is closed under finite unions, if K is any compact subset 
of V we have K C V(S) for some S and hence P(K) < 2p(e,6). But then 
P(V) < 2p(e, 6) by the inner regularity of P. E 


10.28 Theorem. Let Q = (R*)%°) and Q, = C([0, 20), R), and let P be the 
Radon measure on Q whose finite-dimensional projections are given by (10.23), 
according to Theorem 10.18. Then Q is a Borel subset of Q and P(Q.) = 1. 


Proof. A real-valued function w on [0,00) is continuous iff it is uniformly 
continuous on [0, n] for each n, and it is uniformly continuous on [0, n] iff for every 
j € N there exists k € N such that |w(t) — w(s)| < j7} (note the use of < rather 
than <) for all s € [0, n] and all ¢ € [0, n] A (s — k7}, s + k~*). Moreover, even if 
we only assume that w is R*-valued, this last condition implies that it is real-valued 
unless it is identically oo. Therefore, if w. denotes the function whose value is 
identically oo, we have 


OQ. U {Woo } 
(10.29) _ Groene N {w EQ: |w(t) =ne 


n=1j=1k=1 s,tE[0,n], |t—s|<1/k 


By the remarks preceding Lemma 10.24, {w : |w(t) — w(s)| < 771} is closed for all 
s,t, and j. Hence Qe U {Woo} is an Fy6 set, and Qe is therefore a Borel set. 
Moreover, if for €,6 > 0 and n € N we set 


U(n,€,6) = {w E Q : |w(t) — w(s)| > 8e for some t, s € [0, n] with |t — s| < 5}, 


by (10.29) we have 
Q\ Qe = U GFT I U {woo}. 
n=1j=1k=1 


Clearly P({w..}) = 0, so in order to show that P(Q.) = 1, or equivalently that 
P(Q \ Qe) = 0, it will suffice to show that limg—oo P(U(n,€,k7!)) = 0 for all e 
and n. 
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The interval [0,n] is the union of the subintervals (0,k~1], [k~!,2k71], ..., 
In—k7}, n]. Ifw € U(n,e,k—*), then |w(t) — w(s)| > 8e for some t, s lying in the 
same subinterval or in adjacent subintervals, and hence, in the notation of Lemma 
10.27, w € V where 6 = k7! and [a,b] is one of the subintervals. (In the case of 
adjacent subintervals, use their common endpoint as an intermediate point.) As there 
are nk subintervals, Lemma 10.27 implies that P(U(n,¢€,k~+)) < 2nkp(c,k7!). 
Lemma 10.24 then shows that P(U (n, €, k71)) — 0 as k — oo, which completes 
the proof. E 


Exercises 


24. The function f : (R*)? — [0,+00] defined by f(t, s) = |t — s| for t,s € R, 
f(oo,t) = f(t, co) = +00 for t € R, and f(00, 00) = 0 is lower semicontinuous. 


25. Let Q = (R*)!°°), {X;}+>0 the coordinate functions on Q, and for any A C 
[0, 00), M4 = the o-algebra generated by {Xt }rea. (Thus Mio...) is the product 
o-algebra on Q corresponding to the Borel o-algebras on the factors.) 
a. Suppose V € My andw,w’ E Q. Ifw € V and w’'(t) = w(t) for all t € A, 
then uw’ € V. 
b. If V € Mjo 0), then V E€ Ma for some countable set A. (Use Exercise 5 in 
§1.2.) 
c. The set Q. = C([0, 00), R) is not in Moo 0). 


26. Let Q. and P be as in Theorem 10.30. If w E€ Qe, it can be shown that w is 
almost surely not of bounded variation, so if f is a Borel measurable function on 
[0, co), the integral fy" f(t) dw(t) apparently makes no sense. However: 

ali f= so; C;X[a,,b;) 18 a step function, define 


Iw) = f ~ f(t) dur(t) = ¥ 76; [w(b;) — w(a,)]. 


Then If is an L? random variable on Q, with mean 0 and variance fy | f(x) |? da. 
(Hint: The intervals [a;, bj) may be assumed disjoint.) 

b. The map f — I; extends to an isometry from L?([0, 00), m) to L? (Qe, P). 
c. If f € BV((0,0o)) is right continuous and supp(f) is compact, there is a 
sequence {f,,} of step functions such that fa, — f in L? and df, — df vaguely, 
where df denotes the Lebesgue-Stietjes measure defined by f. (By Exercise 41 
in §8.6, there is a sequence {Un} of linear combinations of point masses such 
that dun — df vaguely. Consider f,,(z) = Un((0, z]) + f(0).) 

d. If f € BV((0, co)) is right continuous and supp( f) is compact, then I p(w) = 
— fy w(t) df(t) almost surely. (Check this directly when f is a step function 
and apply (b) and (c).) 
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10.6 NOTES AND REFERENCES 


The development of probability theory as a rigorous mathematical discipline began 
in the early part of the 20th century, when the tools of measure theory and Lebesgue- 
Stieltjes integrals became available. In 1933 Kolmogorov [85] put the subject on a 
solid foundation by explicitly identifying sample spaces and random variables with 
measure spaces and measurable functions. Since then it has grown extensively. 

More detailed accounts of probability theory on a level comparable to that of this 
book can be found in Billingsley [17], Chung [25], and Lamperti [90]. 


810.3: An account of the long history of the central limit theorem can be found in 
Adams [2]. More general versions of this result exist in which the random variables 
are not assumed to be identically distributed; see the references given above. The 
form of Taylor’s theorem used in the proof is explained in Folland [45]. 

The proof of Stirling’s formula outlined in Exercise 21 is due to Wong [162]; see 
Blyth and Pathak [18] for some other probabilistic proofs of Stirling’s formula. 

There is one more major result about the asymptotic behavior of sums of indepen- 
dent identically distributed random variables that should be mentioned along with 
the law of large numbers and the central limit theorem: 


The Law of the Iterated Logarithm: Suppose that {X,,}f° is a sequence 
of independent identically distributed L random variables with mean pz and 
variance 07, and let Sp = $`} X}. Then 


Sn — 
lim sup ae 


noo O1/2nloglogn 


The proof may be found in Chung [25], which gives a more general result; see also 
Lamperti [90] for the case in which the X,,’s are assumed uniformly bounded. 


= 1 almost surely. 


810.4: Theorem 10.18 is a variant due to Nelson [103] of the fundamental 
existence theorem of Kolmogorov [85]. In Kolmogorov’s original construction, the 
sample space is R4 (which could be replaced by (IR*)*) and the c-algebra on which 
the measure P is constructed is & „e 4 Br. Theorem 10.18 is a decided improvement 
on Kolmogorov’s theorem, both in the simplicity of its proof and in the fact that the 
Borel c-algebra on (IR*)“ properly includes ®), <4 Br when A is uncountable. (The 
significance of the latter fact is evident from Exercise 25.) 


§10.5: Wiener constructed his measure P on C([0, oo), R) in [159] and [160]; 
his approach is quite different from ours. Our proof of Theorem 10.30 follows Nelson 
[104]. See also Nelson [105] for some related material, including the derivation of 
the postulates (10.20)-(10.22) from physical principles. 

A discussion of the many interesting properties of the Wiener process 1s beyond the 
scope of this book. We shall mention only one, as a complement to Theorem 10.30: 
The sample paths of the Wiener process are almost surely nowhere differentiable; in 
fact, with probability one, at each point they are Hélder continuous of every exponent 
a< 5 but not of exponent L, This fact may be startling at first, but it seems almost 
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inevitable when one reflects that |w(t) — w(s)|/|t — s|1/? has the standard normal 
distribution for all t, s. 
Knight [84] is a good source for further information about the Wiener process. 





lI 


More Measures and 
Integrals 


In this chapter we discuss some additional examples of measures and integrals that 
are of importance in analysis and geometry: invariant measures on locally compact 
groups, geometric measures of lower-dimensional sets in R”, and integration of 
densities and differential forms on manifolds. Although we have grouped these 
topics together in one chapter, they are substantially independent of one another. 


11.1 TOPOLOGICAL GROUPS AND HAAR MEASURE 


A topological group is a group G endowed with a topology such that the group 
operations (x, y) + xy and x + x7! are continuous from G x G and G to G. 
Examples include topological vector spaces (the group operation being addition), 
groups of invertible n x n real matrices (with the relative topology induced from 
R”*”), and all groups equipped with the discrete topology. If G is a topological 
group, we denote the identity element of G by e, and for A, B C G and z € G we 
define 
tA = {ry:ye A}, Az = {yz: y € A}, 


Aden e AR, AB = {yz:yE€ A, z€ B}. 


We say that A C G is symmetric if A = A71. 
Here are some of the basic properties of topological groups: 


11.1 Proposition. Let G be a topological group. 


a. The topology of G is translation invariant: If U is open and x € G, then Ux 


and xU are open. 
339 
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b. For every neighborhood U of e there is a symmetric neighborhood V of e with 
Vc. 

c. For every neighborhood U of e there is a neighborhood V of e with VV C U. 

d. If H is a subgroup of G, so is H. 

e. Every open subgroup of G is also closed. 

f. If Ky, Ko are compact subsets of G, so is K Ko. 


Proof. (a) is equivalent to the continuity in each variable of the map (x, y) + xy, 
and (b) and (c) are equivalent to the continuity of x ++ x~/ and (x,y) + xy at the 
identity. (Details are left to the reader.) For (d), if x, y € H, there exist nets (7a) acA; 
(yssen in H that converge to x and y. Then 23! > x7! and rayg — xy (with 
the usual product ordering on A x B), so x~/ and zy belong to H. For (e), if H is 
an open subgroup, the cosets xH are open for all x, so that G\\ H = U, ¢H xH is 
open and hence H is closed. Finally, (f) is true because Kı Ko is the image of the 


compact set Ky x Ko under the continuous map (zx, y) > zy. E 


If f is a continuous function on the topological group G and y € G, we define the 
left and right translates of f through y by 


Lyf(z)=f(y'2), Ry f(x) = f(zy). 
(The point of using y~! on the left and y on the right is to make L,, = L,L, and 
Ryz = R,R,.) f is called left (resp. right) uniformly continuous if for every € > 0 
there is a neighborhood V of e such that ||L, f — fllu < € (resp. |R f — fllu < ©) 
for y € V. (Some authors reverse the roles of Ly and Ry in this definition.) 


11.2 Proposition. If f € C.(G), then f is left and right uniformly continuous. 


Proof. We shall consider right uniform continuity; the proof on the left is the 
same. Let K = supp(f) and suppose € > 0. For each x € K there is a neighborhood 
U, of e such that | f (xy) — f (x)| < łe for y € Uz, and by Proposition 11.1(b,c) there 
is a symmetric neighborhood Vz of e with V,V, C Uz. Then {rV,}cex covers K, 
so there exist £1,..., £n E K such that K C U} 2;Vz,. Let V = N] Vz;; we claim 
that | f(xy) — f(x)| < e if y € V. On the one hand, if x € K, then for some j we 
have ya € Vz, and hence ry = x; (xj "x)y € £;U,,; therefore, 


f(y) — F(z)| < |f (zy) — f(z) + |f (zi) - f(a) < €. 


On the other hand, if x ¢ K, then f(x) = 0, and either f(xy) = 0 (if xy ¢ K) or 
eo ay E€ Vz, for some j (if cy € K); in the latter case ae = z ayy E Uz,, 
so that |f (x;)| < 4e and hence | f(xy)| < €. E 


One usually assumes that the topology of a topological group is Hausdorff. The 
following proposition shows that this is not much of a restriction. 
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11.3 Proposition. Let G be a topological group. 
a. If Gis T), then G is Hausdorff. 


b. If G is not T), let H be the closure of {e}. Then H is anormal subgroup, and 
if G/H is given the quotient topology (i.e., a set in G/H is open iff its inverse 
image in G is open), G/H is a Hausdorff topological group. 


Proof. (a) If G is T; and x Æ y € G, by Proposition 11.1(b,c) there is a 
symmetric neighborhood V of e such that ry~' ¢ VV. Then Vz and Vy are 
disjoint neighborhoods of x and y, for it z = vj x = vay for some v1, v2 E€ V, then 
ey tv, tzz7tu € V-IV EVV. 

(b) H is a subgroup by Proposition 11.1d; itis clearly the smallest closed subgroup 
of G. It follows that H is normal, for if H’ were a conjugate of H with H’ 4 H, 
H’ N H would be a smaller closed subgroup. It is routine to verify that the group 
operations on G/H are continuous in the quotient topology, so that G/H is a 
topological group. If € is the identity element in G/H, then {e} is closed since 
its inverse image in G is H. But then every singleton set in G/H is closed by 
Proposition 11.1a, so G/H is T; and hence Hausdorff. E 


In the context of Proposition 11.3b, it is easy to see that every Borel measurable 
function on G is constant on the cosets of H and hence is effectively a function on 
G/H. Thus for most purposes one may as well work with the Hausdorff group G/H. 
We shall be interested in the case where G is locally compact, and we henceforth 
use the term locally compact group to mean a topological group whose topology is 
locally compact and Hausdorff. 

Suppose that G is a locally compact group. A Borel measure p on G is called 
left-invariant (resp. right-invariant) if u(x E) = (E) (resp. u(Ex) = u(E)) for 
all z € G and E € Bg. Similarly, a linear functional J on C,(G) is called left- or 
right-invariant if J(L,f) = I(f) or J(R,f) = I(f) for all f. A left (resp. right) 
Haar measure on G is a nonzero left-invariant (resp. right-invariant) Radon measure 
uon G. For example, Lebesgue measure is a (left and right) Haar measure on R”, and 
counting measure is a (left and right) Haar measure on any group with the discrete 
topology. (Other examples will be found in the exercises.) The following proposition 
summarizes some elementary properties of Haar measures; in it, and in the sequel, 
we employ the notation 


Cy = {f €C.(G): f > Oand || fllu > 0}. 


11.4 Proposition. Let G be a locally compact group. 

a. A Radon measure p on G is a left Haar measure iff the measure f defined by 
(E) = p(E-*) is a right Haar measure. 

b. A nonzero Radon measure u on G is a left Haar measure iff | f du = f Lyf dp 
forall f € C} andy EG. 

c. If u is a left Haar measure on G, then u(U) > 0 for every nonempty open 
U CG, and f fdp >Oforall f € CÈ. 

d. If p is a left Haar measure on G, then (G) < œ iff G is compact. 
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Proof. (a) is obvious. The “only if” implication of (b) follows by approximating 
f by simple functions, and the converse is true by (7.3). As for (c), since u Æ 0, by the 
regularity of u there is acompact K C G with u( K) > 0. If U is open and nonempty, 
K can be covered by finitely many left translates of U, and it follows that u(U) > 0. 
If f € Ch, let U = {2 : f(x) > 4 fllu}. Then f f dy > 4I|flluu(U) > 0. 

Finally, we prove (d). If G is compact, then u(G) < oo since u is Radon. If 
G is not compact and V is a compact neighborhood of e, then G cannot be covered 
by finitely many translates of V, so by induction we can find a sequence {xn} 
such that zn ¢ UT! xjV for all n. By Proposition 11.1(b,c) there is a symmetric 
neighborhood U of e such that UU C V. If m > n and tpU N £mU is nonempty, 
then 1m € Z,UU C £naV, a contradiction. Hence {z,,U }f° is a disjoint sequence, 
and u(£nU) = u(U) > 0 by (c), whence u(G) > u(US mU) = œ. E 


Our aim now is to prove the existence and uniqueness of Haar measures. In view 
of Proposition 11.4a, one can pass from left Haar measures to right Haar measures at 
will, so for the sake of definiteness we shall concentrate on left Haar measures. We 
begin with some motivation for the existence proof. 

If E € Be and V is open and nonempty, let (E : V) denote the smallest number 
of left translates of V that cover EF, that is, 


(E: V) = inf { #(A) Ec |] zV}, 


rEA 


where ##(A) = card(A) if A is finite and #(A) = œœ otherwise. Thus (E : V) is 
a rough measure of the relative sizes of E&E and V. If we fix a precompact open set 
Ep, the ratio (E : V)/(Eo : V) gives a rough estimate of the size of Æ when the size 
of Eo is normalized to be 1. This estimate becomes the more accurate the smaller 
V is, and it is obviously left-invariant as a function of &. We might therefore hope 
to obtain a Haar measure as a limit of the “quasi-measures” (E : V)/(Eo : V) as V 
shrinks to {e}. 

This idea can be made to work as it stands, but it is simpler to carry out if we 
think of integrals of functions instead of measures of sets. If f, € Cy, then 
{x : Ola) > tolla} is open and nonempty, so finitely many left translates of it 
cover supp( f), and it follows that 


2l flu 


< 
IS 16h, 


n 
N La for some z1,..., £n E G. 
1 


It therefore makes sense to define the “Haar covering number” of f with respect to 


Q: 
(f: ¢) = inff) o : f < X cjLz,ġ for some n € N and z1,..., £n € G). 
1 1 


Clearly (f : ø) > 0; in fact, (f : 4) > [Ifllu/Ii¢llu- 








TOPOLOGICAL GROUPS AND HAAR MEASURE 343 


11.5 Lemma. Suppose that f, g,ġ € CH. 

a. (f: 6) =(Lef: ¢) foranyz E€ G. 

b. (cf: ¢) =c(f: ¢) foranyc> 0. 
(f+9:¢)<(f:¢é)+(g9:¢). 

d. (f: p) <(f:9)(g: ¢). 

Proof. We have f < ) i cjLz,¢ iff Lef < ))cjLzz,¢; this proves (a), and 
(b) is equally obvious. If f < oy cjLz,¢ and g < P m41 cj La,¢, then f +g < 
> 1 CjL2;, so (c) follows by minimizing 7" cj and SOY, ¢;. Similarly, if 
f <} cjLa,gandg < dd. Ly, >, then f < }`; p cjdeLajy,¢. Since}; p cide = 


N 





ODID d. ), (d) follows. E 
At this point we make a normalization by choosing fo € CY once and for all and 
defining 
(F: 4) 
Ilf for f, € CT. 
RD 


By Lemma 11.5(a—c), for each fixed ¢ the functional Iy is left-invariant and bears 
some resemblance to a positive linear functional except that it is only subadditive. 
Moreover, by Lemma 11.5d it satisfies 


(11.6) (oars) SG) <6 Jo): 


We now show that, in a certain sense, Jẹ is approximately additive when supp(¢) is 
small. 


11.7 Lemma. Jf fı, fo € Ct and € > 0, there is a neighborhood V of e such that 
Tg(fi) + Le(fa) < Lo(fi + f2) + € whenever supp(¢) C V. 


Proof. Fix g € Cy such that g = 1 on supp(f; + fo), and let 6 be a positive 
number to be specified later. Seth = fı + fo + 6g and h; = f;/h (i = 1,2), where 
it is understood that h; = 0 outside supp(f;). Then h; € Ct, so by Proposition 11.2 
there is a neighborhood V ofe such that |h;(x)—h;(y)| < ĉifi =1,2andy"'z € V. 
Ifo € Cy, supp(¢) C V,andh < X`} cj Lz, Q, then |h;(x) —hi(x;)| < 6 whenever 

= 
zj £ € supp(¢), so 
filz) = (2) <2 et (z; ‘a )h (x) < X` cjl olc; ‘z) (hs (xj) + 8]. 


j 
But then (fi : ¢) < 5c; [hi( fi + 6], and since hy + ho < 1, 


(fi: $) + (f2: $) < oll + 26]. 


Now, ` c; can be made arbitrarily close to (h : ġ), so by Lemma 11.5(b,c), 
To(fi) + Ig(fo) < (1 + 26)Ig(h) < (1 + 26) [Ipfi + f2) + 614(9)]. 


In view of (11.6), therefore, it suffices to choose 6 small enough so that 
26( fi + fo: fo) + 6(1 + 26)(g: fo) < €. 
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11.8 Theorem. Every locally compact group G possesses a left Haar measure. 


Proof. For each f € C+ let Xẹ be the interval [(fo : f)~1, (f : fo)], and let 
A=] fect Xs. Then X is a compact Hausdorff space by Tychonoff’s theorem, 
and by ast 6), every I, is an element of X. For each compact neighborhood V of 

e, let K(V) be the closure in X of {Ig : supp(¢) C V}. Clearly 1); K(V;) > 
K (1; V;), so by Proposition 4.21 there is an element J in the intersection of all 
the K(V)’s. Every neighborhood of J in X intersects {I4 : supp(¢) C V} for 
all V; in other words, for any neighborhood V of e and any f1,..., fn € Cyt and 
€ > 0 there exists 6 € C+ with supp(¢) C V such that |I(f;) — Ig(f;)| < € 
for 7 = 1,...,n. Therefore, in view of Lemmas 11.5 and 11.7, J is left-invariant 
and satisfies I(af + bg) = al(f) + bI(g) for all f,g € Ct and a,b > 0. It 
follows easily, as in the proof of Lemma 7.15, that if we extend J to Ce by setting 
I(f) =1(f*)—I(f7), then J is a left-invariant positive linear functional on C,(G). 
Moreover, I( f) > 0 for all f € Ct by (11.6). The proof is therefore completed by 
invoking the Riesz representation theorem. E 


11.9 Theorem. Jf p and v are left Haar measures on G, there exists c > 0 such that 
u = cv. 


Proof. We first present a simple proof that works when pu is both left- and right- 
invariant — in particular, when G is Abelian. Pick h € C+ such that h € C$ and 
h(x) = h(x") (e.g., h(x) = g(x) + g(x~*) where g is any element of C+). Then 
for any f € C.(G), 


[na | fay = p p(x) dv(y) 
=f fro) Pee a=) fro NETA 
= || ma*y)$(u) dvu) dna) = f | haa) F(u) dolo) dule) 
z | J no oidi y= / [uae (2) dv(y) 
= J T / Fay 


so that u = cv where c = (f hdu)/(f hdv). (f hdv £ 0 by Proposition 11.4c, and 
Fubini’s theorem is applicable since the functions in question are supported in sets 
which are compact and hence of finite measure. The same remarks apply below.) 
Now, another proof for the general case. The assertion that u = cv is equivalent 
to the assertion that the ratio rs = (f f du)/(f f dv) is independent of f € Cy. 
Suppose, then, that f, g € C$; we shall show that rf = rg. 
Fix a symmetric compact neighborhood Vo of e and set 


= [supp(S)]Vo U Volsupp(f)], B = [supp(g)]Vo U Vo[supp(g)]. 
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Then A and B are compact by Proposition 11.1f, and for y € Vo the functions 
rr f(ry)— f(yr) and z + g(xy) — g(yz) are supported in A and B, respectively. 
Next, given € > 0, by Proposition 11.2 there is a symmetric compact neighborhood 
V C Vo of e such that sup, | f(xy) — f(yx)| < € and sup, |g(ry) — g(yx)| < e for 
y € V. Pick h € C} with supp(h) C V and h(x) = h(£7!). Then 


fra | fan = Jj ros (x) dv(y) 
= J | Wu) Flue) dule) duta), 


and since h(x) = h(x7!), 
[raw fta |f nos (x) dv(y) 
= / | h(ytx) f(y) du(z) dv(y) = | / h(x~*y) f(y) du(y) du(z) 
= | ro) f(xy) dv(y 2) = f| ho) f(xy) du(x) dv(y). 
Thus, 


| [ra ftn- [nau fta- [ [reap seu) - Fe) dule) do(y) 


< eA) | dv, 





By the same reasoning, 


fra fodun- frau [ow <eu(B) | hay 


Dividing these inequalities by (f h dv)(f f dv) and (f hdv)(f g dv), respectively, 
and adding them, we obtain 


Jfdu _fJodu| (oA) | WB) 
ffa fgd S (HE B). 


Since € is arbitrary, we are done. E 




















We conclude this section by investigating the relationship between left and right 
Haar measures. If p is a left Haar measure on G and z € G, the measure uz( E) = 
(Ex) is again a left Haar measure, because of the commutativity of left and right 
translations (i.e., the associative law). Hence, by Theorem 11.9 there is a positive 
number A(x) such that yz = A(x)u. The function A : G — (0, oo) thus defined 
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is independent of the choice of u by Theorem 11.9 again; it is called the modular 
function of G. 


11.10 Proposition. A is a continuous homomorphism from G to the multiplicative 
group of positive real numbers. Moreover, if is a left Haar measure on G, for any 
f € L! (u) and y in G we have 


(11.11) [ean du = aw) | fay 
Proof. For any x,y € Gand E € Be, 


A(ry) M(B) = (Ezy) = A(y)p(Bx) = Ay) A(z)u(E), 


so A is a homomorphism from G to (0, 00). Also, since xg (£y) = Xgy-: (2), 


J vele du(2) = pE =A Aa / slit 


This proves (11.11) when f = xg, and the general case follows by the usual linearity 
and approximation arguments. Finally, it is an easy consequence of Proposition 11.2 
that the map z œ> f R,f dy is continuous for any f € C.(G) (Exercise 2), so the 
continuity of A follows from (11.11). E 


Evidently, the left Haar measures on G are also right Haar measures precisely when 
A is identically 1, in which case G is called unimodular. Of course, every Abelian 
group is unimodular; remarkably enough, groups that are highly noncommutative 
are also unimodular. To be precise, let [G, G] denote the smallest closed subgroup 
of G containing all elements of the form [x,y] = zyx~'y~1!. [G, G] is called the 
commutator subgroup of G; it is normal because z[x, y]z~! = [zxz7!, zyz—4], 
and it is trivial precisely when G is Abelian. 


11.12 Proposition. If G/[G, G] is finite, then G is unimodular. 


Proof. Every continuous homomorphism (such as A) from G into an Abelian 
group must annihilate [z, y] for all x, y and must therefore factor through G/[G, G]. 
If the latter group is finite, A(G’) is a finite subgroup of (0, 00); but (0, o0) has no 
finite subgroups except {1}. E 


11.13 Proposition. Jf G is compact, then G is unimodular. 


Proof. For any xz € G, obviously G = Gz. Hence if u is a left Haar measure, 
we have u(G) = u(Gr) = A(x)u(G), and since 0 < p(G) < co we conclude that 
A =r E 


We observed above that if u is a left Haar measure, (E) = (E+) is a right 
Haar measure. We now show how to compute it in terms of u and A. 


11.14 Proposition. dji(x) = A(x)! dy(z). 
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Proof. By (11.11), if f € C.(G), 


J FEAE) dula) = Alo) | Flav) A(e)7 dula) 


= | Ryf@ AlE) dula). 


Thus the functional f —> f f A du is right-invariant, so its associated Radon 
measure is a right Haar measure. However, this Radon measure is simply A~! dp by 
Exercise 9 in §7.2; hence, by Theorem 11.9, AT! du = c du for some c > 0. Ifc Æ 1, 
we can pick a symmetric neighborhood U of e in G such that |A(x)~1—1| < $|c—1| 
on U. But (U) = p(U), so 


e- 1lm(V) = FU) - aU) = | f (AD - 1) du(a)| < zle- 1a), 
a contradiction. Hence c = 1 and du = A! dp. E 


11.15 Corollary. Left and right Haar measures are mutually absolutely continuous. 


Exercises 


1. IfG is a topological group and E C G, then E = (){ EV : V is a neighborhood 
ofe}. 


2. If wis a Radon measure on the locally compact group G and f € Ce(G), the 
functions z —> f L,f dp and £z —> f R, f dp are continuous. 


3. Let G be a locally compact group that is homeomorphic to an open subset U of 
IR” in such a way that, if we identify G with U, left translation is an affine map — that 
is, cy = A,(y) + bz where A, is a linear transformation of R” and b, € R”. Then 
| det A,|~' dz is a left Haar measure on G, where dz denotes Lebesgue measure on 
R”. (Similarly for right translations and right Haar measures.) 


4. The following are special cases of Exercise 3. 
a. If G is the multiplicative group of nonzero complex numbers z = z + îy, 
(x? + y”)—! dx dy is a Haar measure. 
b. If G is the group of invertible n x n real matrices, |det A|~"dA is a left 
and right Haar measure, where dA = Lebesgue measure on R”*”. (To see 
that the determinant of the map X +> AX is n observe that if X is 
the matrix with columns Xt1,..., X”, then AX is the matrix with columns 
AX!,..., AX") 
c. If G is the group of 3 x 3 matrices of the form 





1 
0 (x,y,z E€ R), 
0 


O = RX 
= X 2 
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then dz dy dz is a left and right Haar measure. 
d. If G is the group of 2 x 2 matrices of the form 


T Y 
k 4 (x >0, y€ R), 


then x~dzx dy is a left Haar measure and x~! dz dy is a right Haar measure. 


5. Let G be as in Exercise 4d. Construct a Borel set in G with finite left Haar 
measure but infinite right Haar measure, and a left uniformly continuous function on 
G that is not right uniformly continuous. 


6. Let {Ga}aca bea family of topological groups and G = [[,-4 Ga. 
a. With the product topology and coordinatewise multiplication, G is a topolog- 
ical group. 
b. If each Ga is compact and Ha is the Haar measure on Ga such that wa(Ga) = 
1, then the Radon product of the fzq’s, as constructed in Theorem 7.28, is a Haar 
measure on G. 


7. In Exercise 6, for each a let Ga be the multiplicative group {—1, 1} with the 
discrete topology. Let u be a Haar measure on G. 
a. If ra : G — {—1, 1} is the ath coordinate function, then f TaTg du = 0 for 
a £ B. 
b. If A is uncountable, L? (u) is not separable even though u(G) < oo. 


8. Let Q have the relative topology induced from R. Then Q is a topological 
group that is not locally compact, and there is no nonzero translation-invariant Borel 
measure on Q that is finite on compact sets. 


9. Let G be a locally compact group with left Haar measure p. 
a. G has a subgroup H that is open, closed, and o-compact. (Let H be the 
subgroup generated by a precompact open neighborhood of e.) 
b. The restriction of u to subsets of H is a left Haar measure on H. 
c. u is decomposable in the sense of Exercise 15 in §3.2. 
d. If the topology of G is not discrete, then u({x}) = 0 for all z € G. In this 
case, u is regular iff yz is semifinite iff u is o-finite iff G is o-compact. (See 
Exercises 12 and 14 in 87.2.) 


11.2 HAUSDORFF MEASURE 


In geometric problems it is important to have a method for measuring the size of 
lower-dimensional sets in R”, such as curves and surfaces in R. Differential- 
geometric techniques provide such a method that applies to smooth submanifolds of 
IR” ; see 811.4. However, there is also a measure-theoretic approach to the problem 
that applies to more general sets. Indeed, the basic ideas can be carried out just as 
easily in arbitrary metric spaces, so we begin by working in this generality. 
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Let (X, p) be a metric space. (See 80.6 for the relevant terminology.) An outer 
measure p* on X is called a metric outer measure if 


u*(AU B) = u*(A) Up*(B) whenever p(A, B) > 0. 


11.16 Proposition. If * is a metric outer measure on X, then every Borel susbset 
of X is 4*-measurable. 


Proof. Since the closed sets generate the Borel o-algebra, it suffices to show that 
every closed set F C X is yu*-measurable. Thus, given A C X with u*(A) < œ, 
we wish to show that 


u*(A) > ANF) + p(A\ F). 


Let B, = {x € A\F: p(x, F) > nt}. Then B, is an increasing sequence of sets 
whose union is A \ F (since F is closed), and p(B,, F) > n~t. Therefore, 


u*(A) > p” ((AN F) U Bn) = p* (AN F) + p* (Ba), 
so it will be enough to show that u*(A \ F) = lim p* (Bn). Let Cn = Bnii \ Bn. 
If £ € Chai and p(z, y) < [n(n + 1)]7+, then 


1 1 1 


p(y, F) < p(z, y) + p(z, F) < o a ~ 


, 


so that p(Cn+1, Bn) > [n(n + 1)]~*. A simple induction therefore shows that 
u*(Bək+1) > u* (Cox U Bog_-1) = u* (Caz) + as 1) 


u* (Cok) + p*(Cop_2 U B2*-3) “2H (Co;), 


and similarly w*(Bo,) > par *(Co;-1). Since w*(B,) < u*(A) < œ, it follows 
that the series X` u*(C2;) and 77° yu “(Coy 1) are convergent. But by subadditivity 
we have 


u*(A\ F) < p*(B,) + 5> u“ ( 
n+1 
As n — ov, the last sum vanishes and we obtain 


u*(A\ F) < liminf y* (Bn) < limsupp*(Bn) < *(A \ F), 


as desired. E 


We are now ready to define Hausdorff measure. Suppose that (X, p) is a metric 
space, p > 0, and é > 0. For A C X, let 


H,,6(A) = int{ (diam Bye Ake |B; and diam B; < ô}, 
1 1 
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with the convention that inf @ = oo. As 6 decreases the infimum is being taken over 
a smaller family of coverings of A, so H,,s(A) increases. The limit 


H,(A) = lim Hp (4) 


is called the p-dimensional Hausdorff (outer) measure of A. 
Several comments on this definition are in order: 


e The sets B; in the definition of Hp, are arbitrary subsets of X. However, 
one obtains the same result if one requires the B;’s to be closed (because 
diam B; = diam B;), or if one requires the B;’s to be open (because one can 
replace B; by the open set U; = {x : p(x, Bj) < €2-4— 1}, whose diameter is 
at most (diam B;) + €2~/). Similarly, if X = R, one can restrict the B;’s to 
be closed or open intervals. 


e The intuition behind the definition of H, is that if p is an integer and A is a 
‘‘p-dimensional” subset of IR” such as a relatively open set in a p-dimensional 
linear subspace of R”, the amount of A that is contained in aregion of diameter 
r should be roughly proportional to r?. 


e The restriction to coverings by sets of small diameter is necessary to provide an 
accurate measure of irregularly shaped sets; otherwise one could simply cover 
a set by itself, with the result that its measure would be at most the pth power of 
its diameter. Consder, for example, the curve Am = {(z,sin nz) : |z| < 1} in 
R?. Clearly diam Am < 2?/? for all m, but the length of A, tends to oo along 
with m. One needs to take 6 < m7! before Hı (A) becomes an accurate 
estimate of the length of Am. 


We now derive the basic properties of Hp. 
11.17 Proposition. H, is a metric outer measure. 


Proof. Hp, iS an outer measure by Proposition 1.10, and it follows that H, is an 
outer measure. If p( A, B) > Oand {C;} is a covering of AUB such that (diam C;) < 
ô < p(A, B) for all j, then no C; can intersect both A and B. Splitting $` (diam C4)” 
into two parts according to whether C; NB = © or Cj M A = Ø shows that 
> (diam C;)? > Hp 6(A)+Hp,5(B), and hence Hp (AUB) > Hy,6(A)+Hp,6(B). 
As this inequality is valid whenever 6 < p(A, B), the desired result follows by letting 
6 — 0. E 


In view of Propositions 11.16 and 11.17, the restriction of H, to the Borel sets is 
a measure, which we still denote by H, and call p-dimensional Hausdorff measure. 


11.18 Proposition. H, is invariant under isometries of X. Moreover, if Y is any 
set and f,g : Y — X satisfy p(f(y), f(z)) < Cplgly), g(z)) for all y, z € Y, then 
H,(f(A)) < C?H,(g(A)) forall A C Y. 


Proof. The first assertion is evident from the definition of Hp. As for the 
second, given €,6 > 0, cover g(A) by sets B; such that diam B; < C~'6 and 
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J (diam B;)P < Hp(g(A)) + €. Then the sets B; = f(g~*(B;)) cover f(A), and 
diam B; < C(diam B;) < ô, so that 


Hy,s(f(A)) < X (diam Bi)? < CPH,(9(A)) + CPe. 


The proof is completed by letting 6 — 0 and € — 0. E 


11.19 Proposition. [f H (A) < œ, then H(A) = 0 for all q > p. If Hp(A) > 9, 
then H,(A) = œ forall q < p. 


Proof. It suffices to prove the first statement, as the second one is its contra- 
positive. If H(A) < œœ, for any 6 > O there exists {B;}9° with A c U Bj, 
diam B; < 6, and } (diam B;)P < H(A) + 1. But if q > p, 


X (diam B;)? < 617? X "(diam Bj)? < 6&7? (Hp(A) +1), 


so Hg s(A) < 697P (H (A) + 1). Letting 6 — 0, we see that H,(A) = 0. E 
According to Proposition 11.19, for any A C X the numbers 


inf{p >0:H,(A)=0} and sup{p > 0: H,(A) = œ} 


are equal. Their common value is called the Hausdorff dimension of A. 

From now on we restrict attention to the case X = R”. Our object is to show 
that for p = 1,...,n, Hp gives the geometrically correct notion of measure (up to a 
normalization constant) for p-dimensional submanifolds of R”. We begin with the 
case p = n. 


11.20 Proposition. There is a constant yn, > 0 such that yn, Hn is Lebesgue measure 
on R”. 


Proof. Hy is a translation-invariant Borel measure on R”. If Q C R” is a cube, 
it is easily verified that 0 < Hn (Q) < co (Exercise 10). It follows that H,, Æ 0 
and that Hn is finite on compact sets, whence Hn is a Radon measure by Theorem 
7.8. The desired result is therefore a consequence of Theorem 11.9. (The simple 
argument given there for the Abelian case can be read without going through the rest 
of §11.1: Simply read x + y for xy and —z for £7 t.) E 


The normalization constant 7, turns out to be the volume of a ball of diameter 
1, which by Corollary 2.55 is 1”/? /2°T (sn +1). We shall not give the proof 
here, as the value of Yn is irrelevant for our purposes. (The hard part is proving 
the intuitively obvious fact that among all sets of diameter 1, the ball has the largest 
volume.) Many authors build y, into the definition of Hausdorff measure; that is, 
they define p-dimensional Hausdorff measure, for p € [0,00), to be y,H, where 
Yp = TP? /2°T(Sp + 1). 

We now consider lower-dimensional sets in R”. If 1 < k < n, a k-dimensional 
C! submanifold of R” is a set M C R” with the following property: For each 
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x € M there exist a neighborhood U of x in R”, an open set V C RF, and an 
injective map f : V — U of class C? such that f(V) = M N U and the differential 
D,f — i.e., the linear map from R* to R” whose matrix is [(Of;/0x;)(x)] — is 
injective for each x € V. Such an f is called a parametrization of M N U. Every 
submanifold M can be covered by countably many U’s for which M N U has a 
parametrization, so for our purposes it will suffice to assume that M = M NU has a 
global parametrization. 

(There are other common definitions of “submanifold”: M is a k-dimensional C1 
submanifold if it is locally the set of zeros of aC! map g : R” — R”~* such that Dzg 
is surjective at each x € M, or if it is locally the image of a ball in a k-dimensional 
linear subspace of R” under a C! diffeomorphism of R”. The equivalence of these 
definitions with the one given above is a standard exercise in the use of the implicit 
function theorem.) 

We begin our study of submanifolds with the linear case. If T is a linear map from 
R* to R” and T* : R” — RF is its transpose, then T*T is a positive semidefinite 
linear operator on IR”. Its determinant is therefore nonnegative, and we may define 


I(T) = /det(T*T). 


11.21 Proposition. Jf k < n, A C R*, and T : RE — R” is linear, then 
H,{T(A)) = J(T) Ay (A). 


Proof. If k = n, then det(T*T) = (detT)?, so J(T) = |detT| and the 
assertion reduces to Theorem 2.44 because of Proposition 11.20. If k < n, let R 
be a rotation of R” that maps the range of T to the subspace R! x {0} = {y € 
R” : y; = 0 for j > k}, and let S = RT. Then S*S = T* R* RT = T*T, so that 
J(S) = J(T), and H,(S(A)) = H,(T(A)) since Hy is rotation-invariant. But if 
we identify R* x {0} with R*, S becomes a map from R* to itself, and the definition 
of S*S is unchanged when this identification is made. We are therefore back in the 
equidimensional case, which was disposed of above. g 


It is now easy to guess what the corresponding formula must be for a general 
smooth injection f : R* — R”, since locally every smooth map is approximately 
linear. Our next lemma makes this idea precise. 


11.22 Lemma. Suppose M is a k-dimensional C submanifold of R” parametrized 
by f : V > R”. Foranya > 1 there is a sequence { B; } of disjoint Borel subsets of 
V such that V = JS? B;, and a sequence {T;} of linear maps from R* to R”, such 
that 

(11.23) a` t|Tjz| < |(Dzf)z| < a|T;z| for x € B;, ze R° 


and 
(11.24) aM" |Tjz — Tyy| < |f (£) — f(y)| < alTjx — Tjy| for x,y € Bj. 


Proof. Let us fix € > 0 and 8 > 1 such that 


ates B21 Space 
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and let J be a countable dense subset of the set of linear maps from R* to R” (e.g., 
the set of matrices with rational entries). For T € J and m € N let E(T, m) be the 
set of all x € V such that 


B-*|Tz| < |(Dzf)z| < B|Tz| for z € R*, 
aW"|Tx — Ty| < |f(z) — f(y)| < o|Tx — Ty| for all y € V with |y — £| < m~t. 


The definition of F(T, m) is unaffected if y and z are restricted to lie in countable 
dense subsets of V and R*; hence E(T, m) is defined by countably many inequalities 
involving continuous functions, so it is a Borel set. It will therefore suffice to show 
that the sets F(T, m) cover V. Indeed, each E(T, m) is a countable union of sets 
E;(T, m) of diameter less than m~t, so by disjointifying the countable collection 
{Ei(T,m):T € J, i,m € N} we obtain the desired sets B; and the associated 
maps T}. 

Suppose, then, that x € V, and let 69 = inf{|(D.f)z| : |z| = 1}. Since Dzf is 
injective, ôo is positive. Choose 6 > 0 so that 6 < (8 — 1)do and 6 < (1 — B~*)éo, 
and then pick T € J such that |T — D,f|| < 6. Then 


[Tz] < (Dz f)z| + |T2 — (Def)z| < (Diz f)z| + lz] < B\(Def)z|; 


and similarly |Tz| > 67+t|(Dzf)z|. This establishes the first inequality and also 
shows that T is injective, so n = inf{|Tz| : |z| = 1} is positive. Since f is 
differentiable at x, there exists m € N such that 


|f(y) — f(z) = (De f(y - x)| < enly — z| < «|T (y — x)| for |z — y| < m™t. 


But then 


f(y) — f(@)| < |F) - F2) - (Def)(y- 2)| + |(Def)(y - 2)| 
< elTy — Tzr| + 6|Ty-—Tz| < alTy —Tal, 


and similarly | f(y) — f(x)| > a7"|Ty — Tz]. In short, x € E(T, m), so we are 
done. 
E 


11.25 Theorem. Let M be a k-dimensional C! submanifold of R” parametrized by 
f: V — R”. If Ais a Borel subset of V, then f(A) is a Borel subset of R”, and 


(1126) H(A) = | J(Daf) dHe): 


Moreover, if ¢ is a Borel measurable function on M that is either nonnegative or in 
L” (M, Hx), then 


(11.27) f p(y) dH (y y= fos 2))J(Dz f) dH, (2). 


Proof. Since V is an open subset of R*, it is c-compact. It follows that if A is 
closed in V, then A is o-compact, and hence f(A) is o-compact since f is continuous. 
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The collection of all A C V such that f(A) is Borel is therefore a o-algebra that 
contains all closed sets, hence all Borel sets. We shall prove (11.26); this establishes 
(11.27) when @ = xpa), and the general result follows from the usual linearity and 
approximation arguments. 

Given a > 1, let {B;}, {T;} be as in Lemma 11.22, and let A; = AN Bj. It 
follows from (11.23) and Proposition 11.18 that 


a-* H,(T;(E)) < He((Def)(E)) <A (T(E) (ew € Aj, E CRS), 
and hence by Proposition 11.21, 
a" J(T;) < J(Dzf) < f J(T;) (£ € Aj). 
But it also follows from (11.24) and Proposition 11.18 that 
a~" H,(T;(A;)) < Hk(f(43)) < a" He (Ti (A;)) 


and from Proposition 11.21 that H,(T;(A;)) = J(T;).H,(A;). Therefore, 


aH (f(As)) SOIT) H(A) < f. JDaf) dHe) 


j 


Summing over j, we obtain 


aH (f(A)) $f I(Def)dHe(a) < a H(A), 


so the proof is completed by letting a — 1. E 


If both sides of the identities (11.26) and (11.27) are multiplied by the normalizing 
constant y% in Proposition 11.20, the integrals on the right become ordinary Lebesgue 
integrals, and we obtain the formula for measures and integrals on M given by Rie- 
mannian geometry (see §11.4). Moreover, if k = n we have J(D,f) = | det Dz f|, 
so the result reduces to Theorem 2.47. See also Exercises 11 and 12. 

There remains the question of whether p-dimensional Hausdorff measure is of any 
interest when p ¢ N. An affirmative answer will be provided in the next section. 


Exercises 


10. Show directly from the definition of H,, that if Q is a cube in R”, then 0 < 
H, (Q) < oo. (Hint: There is a constant C such that if & C R”, the Lebesgue 
measure of E is at most C(diam E)”.) 


11. If f : (a,b) — R” is a parametrization of a smooth curve (i.e., a 1-dimensional 
C1 submanifold of R”), the Hausdorff 1-dimensional measure of the curve is 


Oa, 





SELF-SIMILARITY AND HAUSDORFF DIMENSION 355 


12. If: R! — Risa C! function, the graph of ¢ is a k-dimensional C! submanifold 
of R+! parametrized by f(x) = (x, ¢(z)). If A C R*, the k-dimensional volume 
of the portion of the graph lying above A is 


/ V 1+ |Vgo(z)|? dz. 


(First do the linear case, ġ(x) = a- x. Show that if T : R* — R*+! is given by 
Tz = (x,a ; x), then T*T = I + S where Sx = (a- x)a, and hence det(T*T) = 
1 + |a|?. Hint: The determinant of a matrix is the product of its eigenvalues; what 
are the eigenvalues of S?) 


13. In any metric space, zero-dimensional Hausdorff measure is counting measure. 


14. If A,, is a subset of a metric space X of Hausdorff dimension p,, form € N, 
then ();° Am has Hausdorff dimension sup,,, Dm. 


15. If A C R” has Hausdorff dimension p, then A x A c R?” has Hausdorff 
dimension 2p. 


11.3 SELF-SIMILARITY AND HAUSDORFF DIMENSION 


In this section we produce some geometrically interesting examples of sets of frac- 
tional Hausdorff dimension. The sets we consider are “self-similar,’ which means 
roughly that each small part of the set looks like a shrunken copy of the whole set. We 
begin by establishing the terminology necessary to discuss such sets. Our definitions 
will be more restrictive than is really necessary, since we aim only to give the flavor 
of the theory and to display some examples. 

For r > 0, a similitude with scaling factor r is a map S : R” — R” of the 
form S(x) = rO(x) + b, where O is an orthogonal transformation (a rotation or 
the composition of a rotation and a reflection) and b € R”. Suppose that S = 
(S1,..., Sm) is a finite family of similitudes with a common scaling factor r < 1. If 
E C R”, we define 


S(E)=E, S(E)=(J)5,(E),  S*(E) = S(S*-!(E)) fork > 1. 


E is called invariant under S if S(E) = E. In this case, S*(E) = E for all k, 
which means that for every k > 1, E is the union of m* copies of itself that have 
been scaled down by a factor of r*. If, in addition, these copies are disjoint or have 
negligibly small overlap, & can be said to be “self-similar.” 

Before proceeding with the theory, let us examine some of the standard examples 
of self-similar sets. They are all obtained by starting with a simple geometric figure, 
applying a family of similitudes repeatedly to it, and passing to the limit. 


e Given £ € (0, 5), let Cg be the Cantor set obtained from [0, 1] by successively 
removing open middle (1 — 2/3)ths of intervals, as discussed at the end of §1.5. 
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That is, Cg = (\>° S*(E) where 
E = (0, 1], S = (S1, S2), Sı (x) = Bx, Sa(x) = br +1- $. 


See Figure 11.1a. 


The Sierpiński gasket T is the subset of R? obtained from a solid triangle 
by dividing it into four equal subtriangles by bisecting the sides, deleting the 
middle subtriangle, and then iterating. Thus, if we take the initial triangle A 
to be the closed triangular region with vertices (0,0), (1,0), and (4, 1), then 
p= Ne S*(A), where S = (S1, S2, S3) with 


Sj(z) = 32 +bj, bı = (0,0), b2 = (5,0), b3 = (4, 5). 
See Figure 11.1b. 


The snowflake curve © is the subset of R? obtained from a line segment by 
replacing its middle third by the other two legs of the equilateral triangle based 
on that middle third (there are two such triangles; make a definite choice), 
and then iterating. That is, let L be the broken line joining (0, 0) to (4,0) to 


(aa š v3) to (2,0) to (1,0), and let 


where Rọ denotes the rotation through the angle 0. Then © = limy_,.. S*(L); 
more precisely, © = UP? S*(L) \ US? S*(M), where M is the union of the 
open middle thirds of the line segments that constitute L. See Figure 11.1c. 
(The actual “snowflake” is made by joining three rotated and reflected copies 
of & in the same way in which one can join three copies of the initial figure L 
to make a six-pointed star.) 


The Cantor sets, the Sierpinski gasket, and the snowflake curve are all clearly 
invariant under the families of similitudes used to generate them. The condition of 
negligible overlap of the rescaled copies is also satisfied, for in all cases S;(E)NS;(E£) 
is either empty or a single point. 

We now return to the general theory. Suppose that S = (S),..., Sim) is a family of 
similitudes with scaling factor r < 1. We introduce some notation for the actions of 
the iterations of S on points, sets, and measures: For x € R”, E C R”, wp e M(R”), 
andi;,...,7, E {1,..., M}, we set 


Tipik = Si, Ott O Si, (2), Eiir = Si, 077 0S; (E), 
Hir--in (E) = u\(Sa PRN Si )  (E))- 


It is an important property of compact sets that are invariant under a family of 
similitudes that they carry measures with a corresponding invariance property. 
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(a) 





(b) (c) 


Fig. 11.1 The first three approximations to (a) the Cantor set C3/g, (b) the Sierpiński gasket, 
and (c) the snowflake curve. 
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11.28 Theorem. Suppose that S = (S1,..., Sm) is a family of similitudes with 
scaling factor r < 1 and that X is anonmempty compact set that is invariant under 
S. Then there is a Borel measure p on R” such that p(R”) = 1, supp(u) = X, and 
forallk EN, 


1 m 
(11.29) b D aad 


il,- ik=l 


Proof. We shall construct jz as a measure on X, extending it to R” by setting 
u( X°) = 0. Pick x € X, let 6, be the point mass at z, and for k € N define 
uë € M(X) by 


1 m 

k 

ers 2 | lirin 
tl; th = 


that is, for f € CX), 


ftar- 5 S u 


iin ip=l 


Thus each u“ is a probability measure on X. We claim that the sequence {py} 
converges vaguely as k — oo. Indeed, given f € C(X) and € > 0, there exists 
K > Osuch that | f(x) — f(y)| < € whenever z, y € X and |z — y| < r* (diam X). 
Suppose l > k > K. Since zi.. € Xi,...;, and diam Xj,...;, = r (diam X), we 
have 

|F (Eiir) — S (Tiririryi i) < E- 


Summing over 7441,.-.-, 2) gives 


< E€, 





1 
Feni) -= DE 


k41’ tl 


and then summing over i1, . - . , 4% yields | f f du* — f f du'| < €. Thus the sequence 
{Sf du} converges for every f, and the limit defines a positive linear functional on 
C(X). 

Let u be the associated Radon measure, according to the Riesz representation 
theorem. Clearly p(X) = fldu = lim f1du* = 1. Also, we have 2i,...c, € 
Xi ip diam X;,...4, = r*(diam X), and X = (J Xi,...i,, so the points 2;,...;, 
(k € N) are dense in X; it follows that supp(yz) = X. Also, from the definition of 


u? we have 
m 


1 

k+l _ = Ae 

ee ae | 2 [H isin 
Ui savas tk=l 

As | — oo, p**+! and u! both tend vaguely to u, and composition with similitudes 

preserves vague convergence, so (11.29) follows. g 
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The existence of an invariant measure on the invariant set X requires no special 
hypotheses on the similitudes Sj, but in order to be able to compute the Hausdorff 
dimension of X, we need to impose an extra condition which (as we shall see 
in Theorem 11.33b) guarantees that the sets S;(X) have negligibly small overlap. 
Namely, we require S to possess a separating set: a nonempty bounded open set U 
such that 


(11.30) S(U) CU, = S,(U) NS, (U) = Bifi Fj. 


The existence of a separating set is more delicate than it might seem at first; the 
first condition in (11.30) will fail if U is too small, and the second one will fail if U 
is too big. However, all the examples considered above admit separating sets. As 
the reader may easily verify, for the Cantor sets Cg one can take U = (0,1), for the 
Sierpinski gasket one can take U to be the interior of the initial triangular region A, 
and for the snowflake curve one can take U to be the interior of the triangular region 
with vertices (0,0), (1,0), and (5, 4 3), i.e., the interior of the convex hull of the 
initial figure L. 


11.31 Proposition. Suppose that S is a family of similitudes with scaling factor 
r < 1 that admits a separating set U. Then there is a unique nonempty compact set 
that is invariant under S, namely, (° SF (U). 


Proof. Since S(U) C U and the S;’s are continuous, we have 
U>S(U)>S8?(U) > 


It follows that X = Ng S*(U) is a compact invariant set for S, and it is nonempty 
by Proposition 4.21. If Y is another such set, let d(Y, X) = maxyey p(y, X) be 
the maximum distance from points in Y to X. Since S; decreases distances by 
a factor of r, we have d(S;(Y),5;(X)) = rd(Y,X). But Y = U} S;(Y), so 
d(Y, X) < max, d(S;(Y),X) < rd(Y,X). It follows that d(Y, X) = 0, which 
means that Y C X since X and Y are compact. By the same reasoning, X C Y, so 
X is the unique compact invariant set. E 


11.32 Lemma. Let c, C, 6 be positive numbers. Suppose that {Ua }ac 4 is a collec- 
tion of disjoint open sets in R” such that each Ua contains a ball of radius cé and 
is contained in a ball of radius C6. Then no ball of radius 6 intersects more than 
(1 +2C)"c— of the sets U a. 


Proof. If Bisa ball of radius 6 and B NUa Æ Ø, then U a is contained in the ball 
concentric with B with radius (1 +2C)6. Hence, if N of the Ua ’s intersect B, there 
are N disjoint balls of radius cé contained in a ball of radius (1+2C')6. Adding up their 
Lebesgue measures, we see that N (cô)” < [(1 + 2C)6]",so N < (1 +2C)"c™. g 
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11.33 Theorem. Suppose that S = (S1,..., Sm) is a family of similitudes with 
scaling factor r < 1 that admits a separating set U, let X be the unique nonempty 
compact set that is invariant under S, and let p = log, ;, m. Then 

a. 0 < H,(X) < œ; in particular, X has Hausdorff dimension p. 

b. H,(Si(X)15;(X)) =O fori F j. 


Proof. For any k € N we have X = S*(X) = U Xi -ią and diam Xj,...i, = 
r* (diam X), so if 6, = r*(diam X), 


Hp6,(X) < > (diam Xj,...4,)? = m*r*? (diam X}? = (diam X)”. 


Since ó% — 0 as k — ov, it follows that H,(X) < (diam X)? < oo. 

Next we show that H,(X) > 0. Choose positive numbers c and C so that the 
separating set U contains a ball of radius cr! and is contained in a ball of radius C, 
and let N = (1+2C)"c~”. We shall prove that H,(X) > N`! by showing that if 
{E;}S° is any covering of X by sets of diameter < 1, then )\(diam E;)? > N7?. 
Since any set & with diameter 6 is contained in a (closed) ball of radius 6, it is enough 
to show that if X C LU; B; where B; is a ball of radius 6; < 1, then $7 6? > N= 

Let u be the measure on X given by Theorem 11.28. We claim that if B is any 
ball of radius 6 < 1, then u(B) < N6?. The desired conclusion is an immediate 
consequence: 


1= p(X) <> WB) < NY E. 


To prove the claim, let k be the integer such that re <6 <r*-!}, By (11.29), 


p(B) = m~" ` Hiri (B). 


i wagtgal 


Since X c U by Proposition 11.31, we have supp(pu;,...i, ) = Xi;--i, C Ui;.--i,, SO 
Li,..-i,(B) = 0 unless B intersects U;,...;,. On the other hand, iteration of (11.30) 
shows that the sets U;,...;, are all disjoint, and each of them contains a ball of radius 
cr*—1 > cô and is contained in a ball of radius Cr* < C6. By Lemma 11.32, B can 
intersect at most N of the Oe S Therefore, 


p(B) < Nm™ = Nr"? < N86, 


as claimed. 

Finally, since S; decreases distances by a factor of r, we have H,(5;(X)) = 
rH (X) = m—'H,(X) and hence H,(X) = 5X7 H,(S;(X)). But since X = 
UT S;(X), this can happen only if Hp(S:(X) 1.S;(X)) =0 fori £ j. E 


11.34 Corollary. The Cantor sets C'g, the Sierpiński gasket, and the snowflake curve 
have Hausdorff dimension log, jg 2, logy 3, and logs 4, respectively. 
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Exercises 


16. Modify the construction of the snowflake curve by using isosceles triangles rather 
than equilateral ones. That is, given ł <p 5s let L be the broken line connecting 
(0,0) to (8,0) to $(1, V36 =T) to (1 — 8, 0) to (1,0). Proceeding inductively, 
let Lk be the figure obtained by replacing each line segment in Ly_; by a copy of 
L, scaled down by a factor of 6*, and let ig be the limiting set. (Thus 173 is 
the ordinary snowflake curve.) Find the family of similitudes under which Xg is 


invariant, show that it possesses a separating set, and find the Hausdorff dimension 
of & B- 


17. Investigate analogues of the Sierpiński gasket constructed from squares, or 
higher-dimensional cubes, rather than triangles. (There are various possibilities 
here.) 


18. Givenn € Nand p € (0, n), construct Borel sets E1, Eo, E3 C R” of Hausdorff 
dimension p, with the following properties. 

a. 0 < H (E1) < oo. (Exercise 15 or Exercise 17 could be used here.) 

b. H (E2) = oo. (Use (a).) 

c. H,(E3) = 0. (Use Exercise 14.) 


19. The measure u in Theorem 11.28 is unique. 


11.4 INTEGRATION ON MANIFOLDS 


This section is a brief essay on integration on manifolds for the benefit of those 
who are familiar with the language of differential geometry and wish to see how 
the geometric notions of integration fit into the measure-theoretic framework. The 
discussion and the notation will both be quite informal. 

Let M bea C% manifold of dimension m. (We work in the C% category for con- 


venience; C! would actually suffice.) Given a coordinate system © = (21,..., £m) 
on an open set U C M, one can consider Lebesgue measure dz = dx, --- d£m on 
U. This has no intrinsic geometric significance, for if y = (y1,...,Y%m) is another 


coordinate system on U, by Theorem 2.47 we have dy = | det(Oy/Ox)| dx where 
(Oy/Ozx) denotes the matrix (Oy; /Ox;). However, dy and dz are mutually absolutely 
continuous, and the Radon-Nikodym derivative | det(Oy/Ox)| is a C% function. It 
therefore makes sense to define a smooth measure on M to be a Borel measure pu 
which, in any local coordinates x, has the form du = ¢7 dx where ¢” is a nonnega- 
tive C° function. The representations of yz in different coordinate systems are then 
related by 


(11.35) gp” = | det(Oy/dz)|¢d’. 
(This conveniently sloppy notation is in the same spirit as the formula du/dt = 


(du/dx)(dx/dt) for the chain rule. More precisely, if y = F(z), then ¢? = 
| det(Oy/Ox)|(b" o F). 
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Equation (11.35) may be interpreted in the language of geometry as follows. The 
functions | det(Oy/Ozx)| are the transition functions for a line bundle on M; a section 
of this bundle, called a density on M, is an object that is represented in each local 
coordinate system x by a function @”, such that the functions for different coordinate 
systems are related by (11.35). In short, smooth measures can be identified with 
nonnegative densities on M. More generally, any density ¢ defines, at least locally, 
a smooth signed or complex measure u on M, so the integral fy $ = u(K) is well 
defined for any compact K C M, asis f fọ = f f dp for any f € Ce(M). 

Suppose now that M is equipped with a Riemannian metric. In any coordinate 
system x, the metric is represented by a positive definite matrix-valued function 
g” = (g;;). The matrix g” for another coordinate system is related to g” by 


z Oy; OY; z [9Y Oy 
ij = X oh Or, Öri or g = (st) g” Ir 
k,l 


9 2 
det g7 = act (2) det g”. 
Ox 


It follows that ydet g is a positive density on M canonically associated to the metric 
g; it is called the Riemannian volume density on M. In particular, if M is a 
submanifold of R” (n > m), it inherits a Riemannian structure from the ambient 
Euclidean structure. If M is parametrized by f : V — R” as in 811.2, the metric g 
is given in the coordinates induced by f by 


i= Lo bea; TE (E) (2) 

Ox; ðr; Ox Ox ) 

Theorem 11.25 therefore asserts that integration of the Riemannian volume density 
gives m-dimensional Hausdorff measure on M, up to the factor Ym. 

These ideas yield an easy construction of a left Haar measure on any Lie group, 
that is, any topological group that is a C° manifold and whose group operations are 
C’, Namely, choose an inner product on the tangent space at the identity element 
and transport it to every other point by left translation. The result is a left-invariant 
Riemannian metric, and the associated volume density defines a left Haar measure. 

The most popular things to integrate on manifolds are differential forms. For 
our purposes it will suffice to describe a differential m-form on an m-dimensional 
manifold M asa section of the line bundle whose transition functions are det(Oy/Ozx). 
That is, a differential m-form w is given in local coordinates x by a function w7, 
and the function w” for a different coordinate system is related to w? by w? = 
det(Oy/Ox)w¥. (The usual notation is w = w* dzı A ++- A d£m.) Differential m- 
forms thus look just like densities if one restricts oneself to coordinate systems whose 
Jacobian matrices have positive determinant. If it is possible to do this consistently 
on all of M, M is called orientable. In this case, assuming that M is connected, the 


coordinate systems on subsets of M fall into two classes such that within each class 
one always has det(Oy/Ox) > 0; a choice of one of these classes is an orientation of 





so that 
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M. (In R3, for example, one speaks of left-handed or right-handed coordinates.) If 
M is equipped with an orientation, therefore, differential m-forms may be identified 
with densities and as such may be integrated over compact subsets of M. 

The notion of density can be generalized. If O < 0 < 1, a @-density on M is a 
section of the line bundle whose transition functions are | det(Oy/Ox)|?. (Thus, a 
1-density is a density, and a O-density is just a smooth function.) Suppose 0 > 0 and 
p = 071. If dis a O-density, |¢|? is well defined as a nonnegative density and so can 
be integrated over M. The set of 6-densities œ such that ||¢||, = (f |d|?)1/? < œ 
is a normed linear space whose completion L?(M) is called the intrinsic L? space 
of M. The duality results of Chapter 6 work in this setting: If ġ; is a 0;-density for 
j = 1,2, where 6; + 62 = 1, then ¢1¢o is a density and | f ¢1¢2| < ||¢1|lp, ||Gallp. 
where p; = 05", and L”! (M) S (L?2(M))*. 


11.5 NOTES AND REFERENCES 


811.1: The existence and uniqueness of Haar measure were first proved by Haar [59] 
and von Neumann [155], respectively, for groups whose topology is second countable; 
the general case is due to Weil [158]. Our proof of existence and uniqueness follows 
Weil [158] and Loomis [94]. There is another proof, due to H. Cartan, which yields 
existence and uniqueness simultaneously and avoids the use of the axiom of choice 
(which we invoked via Tychonoff’s theorem). This proof, as well as further references 
and historical remarks, can be found in Hewitt and Ross [75, 815]. 

Haar measure is the foundation for harmonic analysis on locally compact groups. 
The articles by Graham, Weiss, and Sally in Ash [7] provide a good introduction to 
this field; a more extensive treatment can be found in Folland [47]. 


811.2: Proposition 11.16is due to Carathéodory [22], and the theory of Hausdorff 
measure was developed in Hausdorff [69]. The computation of the constant Yn can 
be found in Billingsley [17] or Falconer [39, §1.4]. There are other ways of defining 
lower-dimensional measures on R”, all of which agree on smooth submanifolds but 
sometimes differ on more irregular sets; see Federer [41]. 

The concept of Hausdorff measure can be generalized. If is any strictly increas- 
ing continuous function on [0,0o) such that A(0) = 0, for a subset A of a metric 
space X one can define 


Haaye nt {> (diam B;) : A C LIB; diam B; < ô} 
1 1 


and H(A) = lims—o Hy,5(A). Thus Ho = H, where A(t) = tP. Rogers [119] 
contains a systematic treatment of these generalized Hausdorff measures. 


811.3: The computation of the Hausdorff dimension of the Cantor sets Cg goes 
back to Hausdorff [69]. The arguments presented here are due to Hutchinson [78]; 
they can easily be extended to families of similitudes with different scaling factors, 
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that is, S = (S1,..., Sm) where S; has scaling factor r; < 1. Such a family always 
has a unique nonempty compact invariant set X, and if it possesses a separating set, 
the Hausdorff dimension of X is the number p such that >>" r? = 1. See Hutchinson 
[78] or Falconer [39, §8.3]. 

Self-similar sets are among the simplest examples of “fractals.” Falconer [39] is 
a good reference for the geometric measure theory of fractals; see also Edgar [37], 
Falconer [40], and Mandelbrot [96] for other aspects of the theory of fractals. 

Continuous curves of Hausdorff dimension > 1 can be constructed from non- 
differentiable functions. For example, if f : [a, b) — R is Hölder continuous of 
exponent a, 0 < a < 1, the graph of f in R? can have Hausdorff dimension as large 
as 2 — a, and the range of a sample path of the n-dimensional Wiener process (n > 2) 
almost surely has Hausdorff dimension 2. See Falconer [39, §§8.2,7]. 


811.4: The theory of integration of differential forms can be found in a number of 
books such as Warner [157] and Loomis and Sternberg [95]; the latter book also has 
a discussion of densities. 
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Index of Notation 


For the basic notation used throughout the book for sets, mappings, numbers, and 
metric and topological spaces, see Chapter 0. Notation used only in the section in 
which it is introduced is, for the most part, not listed here. 


Analysis on Euclidean space: «x-y (dot product), 235. 0°, x“, a!l, |a| 
(multi-index notation), 236. T” (n-torus), 238. 


Functions and operations on functions: f* (positive and negative 
parts), 46. sgn, 46. xg (characteristic function), 46. fs, f¥ (sections), 65. I, 
58. supp( f) (support), 132, 284. Ap (distribution function), 197. 7, f (translation), 
238. f * g (convolution), 239, 285. ; (dilation), 242. f, Ff (Fourier transform), 
248, 249, 295. (Fd), 283. f (reflection), 283. 


Integrals: The basic notation is developed in §2.2. f f(x)dx (Lebesgue inte- 
gral), 57, 70. [f f du dv (iterated integral), 67. f g dF (Stieltjes integral), 107. 


Measures: pp (Lebesgue-Stieltjes measure), 35. m, m” (Lebesgue measure), 
37, 70. u x v (product), 64. o (surface measure on sphere), 78. v~* (positive and 
negative variations), 87. |v| (total variation), 87, 93. u L v (mutual singularity), 87. 
u X v (absolute continuity), 88. f dp, 89. du/dv (Radon-Nikodym derivative), 91. 
supp(u) (support), 215. p X v (Radon product), 227. u * v (convolution), 270. 


Norms and seminorms: ||f||u (uniform norm), 121. ||T || (operator norm), 
154. ||fl|p (LP norm), 181. ||f||.o (L norm), 184. [f]p (weak L? quasi-norm), 198. 
|||] (measure norm), 222. ||Øll(N,a) (Schwartz space norm), 237. || f||(s) (Sobolev 
norm), 302. 
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Probability theory: E(X) (expectation), 314. o7(X) (variance), 314. Py 
(image measure, distribution), 314. yo (normal distribution), 325. 


Sets: Fo, Fos, Gs, Gso, 22. Ez, EY (sections), 65. 


o-algebras: M(E) (c-algebra generated by E), 22. Bx (Borel sets), 22. 
Kac a Ma, M N (products), 22. £, £” (Lebesgue measurable sets), 37, 70. 
BS. (Baire sets), 215. 


Spaces of functions, measures, etc.: Lt, 49. L', 54, 181. Li, 96. 
BV, 102. NBV, 103. C(X,Y), 119. B(X,R), 121. BC(X, R), 121. B(X), 121. 
C(X), 121. BC(X), 121. C.(X), 132. Co(X), 132. L(X,Y), 154. X*, 157. L?, 
172,181. 1*, 173, 181. LP, 181, 184. IP, 181. Lœ, 184. weak LP, 198. M(X), 222. 
OE 235- CS, 235.2, 235. S237: D 287E 201: CS", 293: Ha 301. ASS: 


306. 








A 


Abel mean, 261 
Abel summation, 261 
Absolute continuity 

of a function, 105 

of a measure, 88 
Absolute convergence, 152 
Accumulation point, 114 
Adjoint, 160, 177 
A.e., 26 
Alaoglu’s theorem, 169 
Alexandroff compactification, 132 
Algebra 

Banach, 154 

of functions, 139 

of sets, 21 
Almost every(where), 26 
Almost sure(ly), 314 
Almost uniform convergence, 62 
Approximate identity, 245 
Arcwise connected set, 124 
Arzela-Ascoli theorem, 137 
A.s., 314 
Axiom 

of choice, 6 

of countability, 116 

of separation, 116 
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B 


Baire category theorem, 161 
Baire classes, 83 

Baire set, 215 

Ball, 13 

Banach algebra, 154 

Banach space, 152 
Banach-Tarski paradox, 20 
Base for a topology, 115 
Bessel’s inequality, 175 
Bijective mapping, 4 
Binomial distribution, 320 
Bochner integral, 179 
Bochner-Riesz means, 279 
Bolzano-Weierstrass property, 15 
Borel isomorphism, 83 

Borel measurable function, 44 
Borel measure 33 

Borel set, 22 

Borel o-algebra, 22 

Borel’s normal number theorem, 330 
Borel-Cantelli lemma, 321 
Boundary, 114 

Bounded linear map, 153 
Bounded set, 15 

Bounded variation, 102 
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C 


Cantor function, 39 
Cantor set, 38 
Cantor-Lebesgue function, 39 
Carathéodory’s theorem, 29 
Cartesian product, 3—4 
Cauchy net, 167 
Cauchy sequence, 14 

in measure, 61 
Cauchy-Riemann equation, 308 
Central limit theorem, 326 
Cesàro mean, 262 
Cesàro summation, 262 
Characteristic function, 46 

of a probability distribution, 314 
Chebyshev’s inequality, 193 
Chi-square distribution, 320 
Closed graph theorem, 163 
Closed linear map, 163 
Closed set, 13, 114 
Closure, 13, 114 
Closure operator, 119 
Cluster point, 118, 126 
Coarser topology, 114 
Cofinal net, 127 
Cofinite topology, 113 
Commutator subgroup, 346 
Compact set, 16, 128 
Compact space, 128 

countably, 130 

locally, 131 

sequentially, 130 
Compactification, 144 

one-point, 132 

Stone-Čech, 144 
Compactly supported function, 132 
Complement, 3 
Complete measure, 26 
Complete metric space, 14 
Complete orthonormal set, 175 
Complete topological vector space, 167 
Completely regular algebra, 146 
Completely regular space, 123 
Completion 

of a measure, 27 

of a normed vector space, 159 

of a o-algebra, 27 
Complex measure, 93 
Composition, 3 
Condensation of singularities, 165 
Conditional expectation, 93 
Conjugate exponent, 183 
Connected component, 119 
Connected set, 118 
Constant-coefficient operator, 273 


Content, 72 
Continuity, 14, 119 

absolute, 88, 105 

Hölder, 138 

Lipschitz, 108 

of linear maps on C'S°, 282 

of measures, 26 

uniform, 14, 238, 340 
Continuous measure, 106 
Continuum, 8 
Continuum hypothesis, 17 
Convergence 

absolute, 152 

almost uniform, 62 

in CSC, 282 

in Lt, 54 

in measure, 61 

in probability, 314 

of a filter, 147 

of a net, 126 

of a sequence, 14, 116 

of a series, 152 
Convex function, 109 
Convolution 

of distributions, 285, 292 

of functions, 239 

of measures, 270 
Coordinate, 4 
Coordinate map, 4 
Countable additivity, 24 
Countable ordinals, 10 
Countable set, 7 
Countably compact space, 130 
Counting measure, 25 
Cover, 15 
Cube, 71, 143 


D 


Daniell integral, 81 
Decomposable measure, 92 
Decreasing function, 12 
Decreasing rearrangement, 199 
DeMorgan’s laws, 3 
Dense set, 13, 114 
Density, 362 

of aset, 100 

Riemannian volume, 362 
Denumerable set, 7 
Derivative 

LP, 246 

of a distribution, 284 

Radon-Nikodym, 91 
Diameter, 15 
Diffeomorphism, 74 
Difference of sets, 3 
Differential operator, 273 








Dirac measure, 25 
Directed set, 125 
Dirichlet kernel, 264 
Dirichlet problem, 274 
Disconnected set, 118 
Discrete, 113 
Discrete measure, 106 
Discrete topology, 113 
Disjoint sets, 2 
Distribution function, 33, 197, 315 
Distribution, 282 
homogeneous, 289 
joint, 315 
of a random variable, 315 
periodic, 297 
tempered, 293 
Domain, 4 
Dominated convergence theorem, 54 
Dual space, 157 


E 


Egoroff’s theorem, 62 
Elementary family, 23 
Elliptic differential operator, 307, 311 
Elliptic regularity theorem, 307 
Embedding, 120 
Entropy, 325 
Equicontinuity, 137 
Equivalence class, 3 
Equivalence relation, 3 
Equivalent metric, 16 
Equivalent norm, 152 
Essential range, 187 
Essential supremum, 184 
Event, 314 
Eventually, 126 
Expectation, 314 
conditional, 93 
Extended integrable function, 86 
Extended real number system, 10 


F 


Fs and Fiyg sets, 22 

Fatou’s lemma, 52 

Fejér kernel, 269 

Field of sets, 21 

Filter, 147 

Finer topology, 114 

Finite intersection property, 128 
Finite measure, 25 

Finite signed measure, 88 
Finitely additive measure, 25 
First category, 161 

First countable space, 116 
First uncountable ordinal, 10 
Fourier integral, 253 
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Fourier inversion theorem, 251 
Fourier series, 248 
Fourier transform, 248—249 

of a measure, 272 

of a tempered distribution, 295 
Fourier-Stieltjes transform, 272 
Fractional integral, 77 
Frequently, 126 
Fréchet space, 167 
Fubini’s theorem, 67-68, 229 
Fubini-Tonelli theorem, 67—68, 229 
Function, 3 
Fundamental theorem of calculus, 106 


G 


G's and G¢,, sets, 22 
Gamma distribution, 320 
Gamma function, 58 
Gauge, 82 

Gauss kernel, 260 

Gaussian distribution, 325 
Generalized Cantor set,-39 
Gibbs phenomenon, 268 
Gram-Schmidt process, 175 
Graph of a linear map, 162 


H 


H-interval, 33 

Haar measure, 341 

Hahn decomposition, 87 

Hahn decomposition theorem, 86 
Hahn-Banach theorem, 157-158 
Hardy’s inequalities, 196 


Hardy-Littlewood maximal function, 96 


Hausdorff dimension, 351 
Hausdorff maximal principle, 5 
Hausdorff measure, 350 
Hausdorff space, 117 
Hausdorff- Young inequality, 248 
Heat equation, 275 
Heine-Borel property, 15 
Heisenberg’s inequality, 255 
Henstock-Kurzweil integral, 82 
Hermite function, 256 

Hermite operator, 256 

Hermite polynomial, 257 
Hilbert space, 172 

Hilbert’s inequality, 196 
Homeomorphism, 119 
Homogeneous distribution, 289 
Hull of an ideal, 142 

Holder continuity, 138 
Hölder’s inequality, 182, 196 
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Ss TSomorphism 
I Borel, 83 
linear, 154 
order, 5 
unitary, 176 


Ideal in an algebra, 142 

Identically distributed random variables, 315 

Iff, 1 

Image, 3 J 
Image measure, 314 

Increasing function, 12 
Independent events, 315 
Independent random variables, 315 
Indicator function, 46 

Indiscrete topology, 113 


Jensen’s inequality, 109 
Joint distribution, 315 
Jordan content, 72 
Jordan decomposition 

of a function, 103 

of a signed measure, 87 


Inequality AS 
Bessel’s, 175 Jordan decomposition theorem, 87 
Chebyshev’s, 193 K 


Hardy’s, 196 
Hausdorff-Young, 248 
Heisenberg’s, 255 
Hilbert’s, 196 
Holder’s, 182, 196 L 
Jensen’s, 109 
Kolmogorov’s, 322 
Minkowski’s, 183, 194 
Schwarz, 172 
triangle, 151 
Wirtinger’s, 254 
Young’s, 240-241 

Infimum, 9-10 

Initial segment, 9 


Injective mapping, 4 weak, 321 


Inner measure 22 Law of the iterated logarithm, 336 
Inner product, 171 

LCH space, 131 
Inner regular measure, 212 


Integrable function, 53 


Kernel of a closed set, 142 
Kolmogorov’s inequality, 322 
Krein extension theorem, 161 


LP derivative, 246 

LP norm, 181, 184 

LP space, 181, 184 
intrinsic, 363 
weak, 198 

Laplacian, 273 

Lattice of functions, 139 

Law of large numbers 
strong, 322-323 


Lebesgue decomposition, 91 
Lebesgue differentiation theorem, 98 


weakly, 179 Lebesgue integral, 56 
extended, 86 Lebesgue measurable function, 44 
locally, 95 Lebesgue measurable set, 37, 70 
Integral Lebesgue measure, 37, 70 
Bochner, 179 Lebesgue set, 97 
Daniell, 81 Lebesgue-Radon-Nikodym theorem, 90, 93 
fractional, 77 Lebesgue-Stieltjes integral, 107 
Henstock-Kurzweil, 82 Lebesgue-Stieltjes measure, 35 
Lebesgue, 56 Left continuous function, 12 
Lebesgue-Stieltjes, 107 Left-invariant measure, 341 
of a complex function, 53 Lemma 
of a nonnegative function, 50 Borel-Cantelli, 321 
of a real function, 53 Fatou’s, 52 
of a simple function, 49 monotone class, 66 
Riemann, 57 Riemann-Lebesgue, 249 
Interior, 13, 114 three lines, 200 
Invariant set, 355 Urysohn’s, 122, 131, 245 
Inverse, 4 Weyl’s, 308 
Inverse image, 3 Zorn’s, 5 
Invertible linear map, 154 Limit inferior, 2, 11 


Isometry, 154 Limit of a net, 126 





Limit superior, 2, 11 

Linear functional, 157 
positive, 211 

Linear ordering, 5 

Liouville’s theorem, 300 

Lipschitz continuity, 108 

Localized Sobolev space, 306 

Locally compact group, 341 

Locally compact space, 131 

Locally convex space, 165 

Locally finite cover, 135 

Locally integrable function, 95 

Locally measurable set, 28 

Locally null set, 192 

Lower bound, 5 

Lower semicontinuous function, 218 

LSC function, 218 

Lusin’s theorem, 64, 217 


M 


Map, 3 
Mapping, 3 
Marcinkiewicz interpolation theorem, 202 
Maximal element, 5 
Maximal function, 96, 246 
Maximal theorem, 96 
Meager set, 161 
Mean ergodic theorem, 178 
Mean, 314-315 
Measurable function, 44 
Measurable mapping, 43 
Measurable set, 25 
Lebesgue, 37, 70 
locally, 28 
with respect to an outer measure, 29 
Measurable space, 25 
Measure, 24 
Borel, 33 
complete, 26 
complex, 93 
continuous, 106 
counting, 25 
decomposable, 92 
dicrete, 106 
Dirac, 25 
finitely additive, 25 
Hausdorff, 350 
inner, 32 
inner regular, 212 
Lebesgue, 37, 70 
Lebesgue-Stieltjes, 35 
outer, 28 
outer regular, 212 
positive, 85 
Radon, 212 
regular, 99, 212 
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semifinite, 25 

o-finite, 25 

signed, 85 

singular, 87 

smooth, 361 
Measure space, 25 
Metric, 13 
Metric outer measure, 349 
Metric space, 13 
Minimal element, 5 
Minkowski’s inequality, 183 

for integrals, 194 
Modular function, 346 
Moment convergence theorem, 320 
Monotone class, 65 
Monotone class lemma, 66 
Monotone convergence theorem, 50 
Monotone function, 12 
Monotonicity of measures, 25 
Multi-index, 236 
Mutually singular measures, 87 


N 


Negative part of a function, 46 
Negative set, 86 
Negative variation 

of a function, 103 

of a signed measure, 87 
Neighborhood, 114 
Neighborhood base, 114 
Net, 125 
Norm, 152 

LP, 181, 184 

operator, 154 

product, 153 

quotient, 153 

uniform, 121 
Norm topology, 152 
Normal distribution, 325 
Normal number, 330 
Normal space, 117 
Normed linear space, 152 
Normed vector space, 152 
Nowhere dense set, 13, 114 
Null set, 26, 86 

locally, 192 


O 


One-point compactification, 132 
Open map, 162 

Open mapping theorem, 162 
Open set, 12-13, 114 

Operator norm, 154 

Order isomorphism, 5 

Order topology, 118 

Ordinal, 10 
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Orientable manifold, 362 
Orientation, 362 
Orthogonal projection, 177 
Orthogonal set, 173 
Orthonormal basis, 176 
Orthonormal set, 175 
Outer measure, 28 

metric, 349 
Outer regular measure, 212 


P 


Paracompact space, 135 
Parallelogram law, 173 
Parametrization, 352 
Parseval’s identity, 175 
Partial ordering, 4 
Partition 

of an interval, 56 

of unity, 134 

tagged, 82 
Periodic distribution, 297 
Periodic function, 238 
Plancherel theorem, 252 
Point mass, 25 
Pointwise bounded family, 137 
Poisson distribution, 320 
Poisson kernel, 260, 262 


Poisson summation formula, 254 


Polar coordinates, 78 
Polar decomposition, 46 
Positive definite function, 272 
Positive linear functional, 211 
Positive measure, 85 
Positive part of a function, 46 
Positive set, 86 
Positive variation 

of a function, 103 

of a signed measure, 87 
Pre-Hilbert space, 172 
Precompact set, 128 
Predecesor, 9 
Premeasure, 30 
Principal symbol, 306 
Probability measure, 313 
Product measure, 64 
Product metric, 13 
Product norm, 153 
Product o-algebra, 22 
Product topology, 120 
Projection, 4 

orthogonal, 177 
Proper map, 135 
Pythagorean theorem, 173 


Q 


Quotient norm, 153 


Quotient space, 153 
Quotient topology, 124 


R 


Radon measure, 212 

complex, 222 
Radon product, 227 
Radon-Nikodym derivative, 91 
Radon-Nikodym theorem, 91 
Random variable, 314 
Range, 4 
Real-analytic function, 263 
Rectangle, 64 
Refinement of a cover, 135 
Reflexive space, 159 
Regular measure, 99, 212 
Regular space, 117 
Relation, 3 
Relative topology, 114 
Rellich’s theorem, 305 
Residual set, 161 
Reverse inclusion, 125 
Riemann integrable function, 57 
Riemann integral, 57 
Riemann-Lebesgue lemma, 249 
Riemannian volume density, 362 
Riesz representation theorem, 212, 223 
Riesz-Thorin interpolation theorem, 200 
Right continuous function, 12 
Right-invariant measure, 341 
Ring of sets, 24 


S 


Sample mean, 325 
Sample space, 314 
Sample variance, 325 
Sampling theorem, 255 
Saturated measure, 28 
Saturation of a measure, 28 
Scalar product, 171 
Schröder-Bernstein theorem, 7 
Schwartz space, 236 
Schwarz inequality, 172 
Second category, 161 
Second countable space, 116 
Section of a set or function, 65 
Semifinite measure, 25 
Semifinite part, 27 
Seminorm, 151 
Separable space, 14, 116 
Separating set, 359 
Separation 

of points, 139 

of points and closed sets, 143 
Sequence, 4 
Sequentially compact space, 130 











Shannon’s theorem, 324 
Shrink nicely, 98 
Sides of a rectangle, 70 
Sierpinski gasket, 356 
o-algebra, 21 
Borel, 22 
generated by a family of functions, 44 
generated by a family of sets, 22 
of countable or co-countable sets, 21 
product, 22 
o-compact space, 133 
o-field, 21 
o-finite measure, 25 
o-finite set, 25 
o-finite signed measure, 88 
o-ring, 24 
Signed measure, 85 
Similitude, 355 
Simple function, 46 
Singular measure, 87 
Slowly increasing function, 294 
Smooth measure, 361 
Snowflake curve, 356 
Sobolev embedding theorem, 303, 308 
Sobolev space, 301 
localized, 306 
Standard deviation, 314 
Standard normal distribution, 325 


Standard representation of a simple function, 46 


Stirling’s formula, 327 
Stone-Cech compactification, 144 
Stone- Weierstrass theorem, 139, 141 
Strong law of large numbers, 322-323 
Strong operator topology, 169 
Strong type, 202 
Stronger topology, 114 
Subadditivity, 25 
Subbase for a topology, 114 
Sublinear functional, 157 
Sublinear map, 202 
Submanifold, 351 
Subnet, 126 
Subordination, 134 
Subsequence, 4 
Subspace of a vector space, 151 
Support 

of a distribution, 284 

of a function, 132 

of a measure, 215 
Supremum, 9-10 
Surjective mapping, 4 
Symbol, 273 

principal, 306 
Symmetric difference, 3 
Symmetric neighborhood, 339 
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T 


To,..., 14 space, 116, 123 
Tagged partition, 82 
Tempered distribution, 293 
Tempered function, 293 
Theorem 
Alaoglu’s, 169 
Arzela-Ascoli, 137 
Baire category, 161 
Carathéodory’s, 29 
central limit, 326 
closed graph, 163 
dominated convergence, 54 
Egoroff’s, 62 
Fourier inversion, 251 
Fubini-Tonelli, 67-68, 229 
Hahn decomposition, 86 
Hahn-Banach, 157-158 
Jordan decomposition, 87 
Krein extension, 161 
Lebesgue differentiation, 98 
Lebesgue-Radon-Nikodym, 90, 93 
Liouville’s, 300 
Lusin’s, 64, 217 
Marcinkiewicz interpolation, 202 
maximal, 96 
moment convergence, 320 
monotone convergence, 50 
open mapping, 162 
Plancherel, 252 
Pythagorean, 173 
Radon-Nikodym, 91 
Rellich’s, 305 
Riesz representation, 212, 223 
Riesz-Thorin interpolation, 200 
sampling, 255 
Schréder-Bernstein, 7 
Shannon’s, 324 
Sobolev embedding, 303, 308 
Stone-Weierstrass, 139, 141 
Tietze extension, 122, 131 
Tychonoff’s, 136 
Urysohn metrization, 145 
Vitali convergence, 187 
Vitali covering, 110 
Weierstrass approximation, 141, 318 
Three lines lemma, 200 
Tietze extension theorem, 122, 131 
Tonelli’s theorem, 67-68, 229 
Topological group, 339 
Topological space, 113 
Topological vector space, 165 
Topology, 113 
cofinite, 113 
generated by a family of sets, 114 
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indiscrete, 113 Urysohn metrization theorem, 145 
norm, 152 Urysohn’s lemma, 122, 131, 245 
of uniform convergence, 133 USC function, 218 


of uniform convergence on compact sets, 133 


product, 120 ` 

quotient, 124 Vague topology, 223 

relative, 114 Vanish at infinity, 132 

strong operator, 169 Variance, 314-315 

trivial, 113 Vitali convergence theorem, 187 

vague, 223 Vitali covering theorem, 110 

weak operator, 169 

weak, 120, 168 W 

* 

weak a 169 Wave equation, 275 

Zatiski LE Weak ergence, 169 
Torus, 238 P ae 


Weak LP, 198 

Weak law of large numbers, 321 
Weak operator topology, 169 
Weak topology, 120, 168 

Weak type, 202 


Total ordering, 5 

Total variation 
of a complex measure, 93 
of a function, 102 
of a signed measure, 87 


Totally bounded set, 15 Weak* topology, 169 
Transfinite induction, 9 Weaker topology, 114 
Transpose, 160 Weierstrass approximation theorem, 141, 318 
Triangle inequality, 151 Weierstrass kernel, 260 
Trivial topology, 113 Well ordering principle, 5 
Tychonoff space, 123 Well ordering, 5 
Tychonoff’s theorem, 136 Weyl’s lemma, 308 
——— Wiener process, 332 

U abstract, 331 

Uniform boundedness principle, 163 Wirtinger’s inequality, 254 
Uniform continuity, 238, 340 Y 

Uniform integrability, 92 

Uniform norm, 121 Young’s inequality, 240-241 
Unimodular group, 346 Z 


Unitary map, 176 


Upper bound, 5 Zariski topology, 117 
Upper semicontinuous function, 218 Zorn’s lemma, 5 
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